Page MenuHomePhabricator

Output ignored references
Open, Needs TriagePublic

Description

To extract citation metadata from Wikipedia article references, our current approach is to rely on citation templates.

In one of the Web2Cit Advisory Board meetings, user @Strainu suggested that we used the raw contents of <ref> tags instead. That is, how much of these contents can and cannot be recovered from Citoid.

In principle, we decided to move forward with the original approach. However, it was proposed to check how many references we would be missing because of focusing on citation templates alone.

The following workflow may help:

  1. Extract references (that is, the content of <ref> tags) from the wikitext of selected articles.
  2. Extract supported citation templates (i.e, listed in our citation templates list) from these references.
  3. Count and output ignored references:
    • references not using citation templates
    • references with unsupported citation templates (i.e., templates not listed in our list of citation templates). If possible, list which these templates are, to help improve our citation templates list (together with T299346).

Event Timeline

diegodlh renamed this task from Output ignored citation templates to Output ignored references.Jan 17 2022, 2:28 PM