Page MenuHomePhabricator

Ignore citations having wikilinks
Closed, ResolvedPublic

Description

As a final task of Web2Cit research subproject, the following script performs automatic creation of Web2Cit translation tests for a selection of web domains previously evaluated (see evaluation script).

Translation tests represent the expected output for specific target webpages, so the citation metadata in the translation goals must be correct and accurate. We use as input the citation metadata extracted from featured articles (https://github.com/hdcaicyt/Web2Cit-research/blob/main/data/citations_metadata_valid_urls.csv.gz) (which is assumed to be correct). These metadata often has wikilinks but as stated in T309869, Web2Cit does not support wikilinks for the moment.

In consequence, we should solve the wikilinks or exclude the citations having wikilinks from the input corpus.

Event Timeline

Some examples of manual data with wikicode:

  • {{versalita|neira}}
  • {{langue|en|ign's zelda fanatics}}
  • les cahiers de la shoah, {{ndeg|1}}
  • stade rennais: {{citation|on fait les choses a l'envers}}, christian gourcuff regrette deja l'arrivee d'olivier letang
  • superstrings in {{math|''d''{{=}}10}} from supermembranes in {{math|''d''{{=}}11}}

Citations having wikicode in any of the citation fields used for building translation tests (title, author first name, author last name, publishing date) represent 3% of the evaluated citations.

Nidiah moved this task from To do to Done on the Web2Cit-Research board.

@Nidiah: Hi, if there is nothing else to do in this task, could you please set the task status to resolved? Thanks a lot!

Thanks for the reminder @Aklapper!