Page MenuHomePhabricator

Multilingual library for extracting template citations from wikicode
Open, Needs TriagePublic

Description

  • Title of the proposal: Multilingual library for extracting (automatic) citations from wikicode
  • Description: Retrieving citations from Wikipedia articles is a challenging task since Wikipedia API does not provide citations directly. In consequence, many large-scale studies of references in Wikipedia develop their own parsers (Kaffee & Elsahar, 2021; Singh et al., 2020 ; Lewoniewski et al. 2017). The usual approach is to parse the wikitext but the diversity in citation template names and parameter names between languages challenges the use of one parser for all Wikipedias.
  • Username for contact: @Nidiah
  • Language of the team (English, Arabic, etc.): TBD
  • Prerequisites: Some knowledge about the format of references and citation templates in Wikipedia.
  • Any other details to share?:
  • Interested? Add your username below:

Event Timeline

@Pablo mentioned the tool https://github.com/internetarchive/iare. A couple months ago there was https://internetarchive.github.io/ware/, which seemed to be a frontend using IARE on the background. However, the site is down now, and the repository (https://github.com/internetarchive/ware) redirects to the IARE repository. Maybe @Harej can provide further information?

Regarding the title of the task, do you think it would be better to change it to "template citations" instead of "automatic citations"? Once the citations has been generated automatically via Citoid (that's what I understood with "automatic citations"; but maybe I got it wrong), we can't know if the citation has been introduced automatically or manually. But based on the description of the task, it appears to me that you are referring to citations introduced as citation templates, instead of plain text. Did I get that correctly?

Thank you for your comments, I'll change the title

Nidiah renamed this task from Multilingual library for extracting (automatic) citations from wikicode to Multilingual library for extracting template citations from wikicode.May 17 2023, 2:44 PM