Web2Cit is a tool under development that will let users collaboratively and visually define translation (i.e., metadata extraction) rules to circumvent Zotero translation deficits (until proper JavaScript translators are written/fixed for each case). More info about the project here.
As described in the project's proposal, one of the Web2Cit-Research subproject's goals is to develop an automated script that compares how well Citoid is doing now, vs how well it will do in the future, when Web2Cit will have been running for some time.
To this aim, @Nidiah, @Gimenadelrioriande and Romina De León have been collecting URLs cited in Wikipedia featured articles from different languages. Given that they come from featured articles, we more or less assume that their metadata have been curated and are generally correct (we have extensively discussed the validity of this assumption within our team and also with our Advisory Board).
For the second part, @Nidiah is currently collecting Citoid responses for the URLs extracted (these will be compared against the corresponding "correct" metadata extracted). She's doing this from a PAWS notebook, and she will use a cache to make sure that we don't ask twice for the same URL (in case it appears more than once in the pool of >450k citations extracted). However, she noticed that the response was relatively slow. We considered making parallel requests, but we don't want to overload the Citoid service.
@Mvolz, in your opinion what would be the best way (as fast as possible without disrupting the service) to do this?
Alternatively, we could set up a custom Citoid service (as long as we make sure that it is running the exact same code). But in addition to the extra work, I assume that it would involve similar hardware and network resources anyway, because we would have to run it on a Wikimedia server. In addition, it wouldn't benefit from Wikimedia's RESTBase caching capabilities.
Finally, given the large amount of citations, we could also consider randomly sampling a smaller subset each time (how large?), assuming that it shouldn't change the results much.