New Wikipedias are created time to time. These Wikipedias might benefit from "Add a link".
The model for "Add a link" has to be trained at the time of the deployment, so that newcomers can have a few tasks to work on, and more when the wiki expands.
New Wikipedias are created time to time. These Wikipedias might benefit from "Add a link".
The model for "Add a link" has to be trained at the time of the deployment, so that newcomers can have a few tasks to work on, and more when the wiki expands.
| Status | Subtype | Assigned | Task | ||
|---|---|---|---|---|---|
| Open | Trizek-WMF | T304110 [EPIC] Deploy "add a link" to all Wikipedias | |||
| Open | None | T304052 Enable Growth features on Wikipedias upon creation | |||
| Open | None | T308146 Integrate the model training and the deployment of "Add a link" to new Wikipedias exiting the Incubator |
Is there a precedent for process around this? Does something similar happen with ORES, for example?
ORES is limited to largish wikis. It would have the same problem (no training data) plus it also takes significant volunteer effort to train, which for a brand new wiki can be better used elsewhere.
For the last two ones: guw.wp, created last March now have 755 articles. kcg.wp, created yesterday has about 875 articles.
We have a lot of wikis with less than 1,000 articles listed as active wikis. What is the minimum needed to train the models? We can skip all Wikipedias with less than [number] of articles, but then we will have to find a way to monitor them so that they would get the model and the tool when times comes.
@kostajh I dont think there is a well-defined minimum. In principle, you can train on anything, though fewer articles will mean fewer training data. The question is then whether this is enough for the model to learn meaningful patterns from that. I honestly dont have a well-informed answer for that. We should try in any case for these wikis. We could track the performance of the backtesting evaluation for wikis of different sizes and check if there is a significant drop when the number of articles becomes too small.
A few small wikis have been trained in round 4. Adyghe Wikipedia (ady) has 491 articles; Akan Wikipedia ak has 590 articles. They are small Wikipedias. We can use them for checkups. How can we do this?
If you would like to check wiki models before they are deployed, I think the backtesting evaluation can be used for this.
I usually add the backtesting numbers on each task for-example: T304548#7937440. Good indicators should have the precision at around 75% (or more) and the recall should not drop below 20%.
There's also usually a gap in time between when the datasets are published (so they can be queried on https://api.wikimedia.org/service/linkrecommendation/apidocs/) and when we start caching recommendations based on those datasets in MediaWiki. So you could use https://api.wikimedia.org/service/linkrecommendation/apidocs/ with the the ady and ak wikis to see how well they perform now.