- Title of the proposal: Multilingual library for extracting (automatic) citations from wikicode
- Description: Retrieving citations from Wikipedia articles is a challenging task since Wikipedia API does not provide citations directly. In consequence, many large-scale studies of references in Wikipedia develop their own parsers (Kaffee & Elsahar, 2021; Singh et al., 2020 ; Lewoniewski et al. 2017). The usual approach is to parse the wikitext but the diversity in citation template names and parameter names between languages challenges the use of one parser for all Wikipedias.
- Username for contact: @Nidiah
- Language of the team (English, Arabic, etc.): TBD
- Prerequisites: Some knowledge about the format of references and citation templates in Wikipedia.
- Any other details to share?:
- Citation template parser for English (in Lua): https://github.com/dissemin/wikiciteparser
- Cli tool for extracting <refs> from MediaWiki XML database dumps https://github.com/mediawiki-utilities/python-mwrefs
- Python script for extracting inline references and citations from MediaWiki XML database dumps: https://github.com/scribe-wikimedia/credibility-api
- Interested? Add your username below: