As a final task of Web2Cit research subproject, the following script performs automatic creation of Web2Cit translation tests for a selection of web domains previously evaluated (see evaluation script).
Translation tests represent the expected output for specific target webpages, so the citation metadata in the translation goals must be correct and accurate. We use as input the citation metadata extracted from featured articles (https://github.com/hdcaicyt/Web2Cit-research/blob/main/data/citations_metadata_valid_urls.csv.gz) (which is assumed to be correct). These metadata often has wikilinks but as stated in T309869, Web2Cit does not support wikilinks for the moment.
In consequence, we should solve the wikilinks or exclude the citations having wikilinks from the input corpus.