Page MenuHomePhabricator

Emit citation template extraction summary
Closed, ResolvedPublic

Description

The Web2Cit research script uses a list of citation templates and parameters to decide which templates to extract from the wikitext of the featured articles downloaded.

To help identify errors in the script or in the list of citation templates (for example, typos such as "Citeweb" instead of "Cite web"), it may be useful that the script informed the list of citation templates identified, and how many instances of each it extracted.

Event Timeline

See 2.3 in understand-citoid-coverage.ipynb

Results in citation_templates_freq.csv

Summary plot : 7 main citation templates by language

Great! I see some look like "Cite web <!-- Citation bot bypass-->". The "<!-- ...-->" is a comment and can be ignored. In the example shown, these templates should be treated as "Cite web". Anyway, they are not so many, so for now it would be OK if we just added a comment to the script indicating that we may address this in the future.

@Gimenadelrioriande, we may check this output against the list of citation templates. If the output is missing a citation template in the list, it may be indicative that something is wrong with that template in the list.