- Training models
- Swahili Wikipedia sw
- Walloon Wikipedia wa
- Waray Wikipedia war
- Wolof Wikipedia wo
-
Wu Chinese Wikipedia wuusee T308139#8728522 - Kalmyk Wikipedia xal
- Xhosa Wikipedia xh
- Mingrelian Wikipedia xmf
- Yiddish Wikipedia yi
- Yoruba Wikipedia yo
- Zhuang Wikipedia za
- Zeelandic Wikipedia zea
-
Chinese Wikipedia zhsee T308139#8720236 -
Classical Chinese Wikipedia zh-classicalsee T308139#8728522 - Min Nan Wikipedia zh-min-nan
-
Cantonese Wikipedia zh-yuesee T308139#8728522 - Zulu Wikipedia zu
- Models verification
- Publish Datasets
- Populate the excluded section titles
- Deploy back-end
- Check how the model works on the wikis
- In Search, use hasrecommendation:link to find articles
- Test them on https://api.wikimedia.org/service/linkrecommendation/apidocs/#/default/get_v1_linkrecommendations__project___domain___page_title_
- Inform communities
- Deploy front-end
Description
Details
- Due Date
- Oct 11 2023, 4:00 PM
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Open | • lbowmaker | T307881 Scaling of link suggestions service | |||
Open | Trizek-WMF | T304110 [EPIC] Deploy "add a link" to all Wikipedias | |||
Resolved | Sgs | T308139 Deploy "add a link" to 14th round of wikis |
Event Timeline
15/16 models were trained successfully in the 14th round of wikis.
The Chinese Wikipedia (zhwiki) training pipeline returned a UnicodeEncodeError being investigated in this task T325521#8717012.
Model evaluation has been completed and below are the backtesting results:
Precision@0.5 | Recall@0.5 | |
wawiki | 0.81 | 0.40 |
warwiki | 0.95 | 0.77 |
wowiki | 0.83 | 0.54 |
wuuwiki | 0.00 | 0.00 |
xalwiki | 0.99 | 0.60 |
xhwiki | 0.83 | 0.32 |
xmfwiki | 0.76 | 0.27 |
yiwiki | 0.76 | 0.44 |
yowiki | 0.96 | 0.83 |
zawiki | 0.91 | 0.61 |
zeawiki | 0.97 | 0.78 |
zh_classicalwiki | 0.00 | 0.00 |
zh_min_nanwiki | 0.97 | 0.84 |
zh_yuewiki | 0.48 | 0.00 |
zuwiki | 0.97 | 0.80 |
CCing @MGerlach, in case he would like to add comments on the backtesting evaluation.
The conclusion on the backtesting results is that most of the languages look fine besides:
- wuuwiki, zh_classicalwiki, and zh_yuewiki which have extremely low precision and recall compared to the recommended threshold of 0.75 and 0.2.
Talked to @MGerlach about these results and agreed that wuuwiki, zh_classicalwiki, and zh_yuewiki should not be deployed.
He also said:
I think for the zhwiki_* it is expected as we rely mostly on whitespaces for word-tokenization to identify link-candidates in the text which is likely to not work for these languages.
@kostajh, we published datasets for all 12/16 models that passed the evaluation in this round.
Change 954004 had a related patch set uploaded (by Sergio Gimeno; author: Sergio Gimeno):
[operations/mediawiki-config@master] GrowthExperiments: enable AddLink backend for swwiki
Change 954004 merged by jenkins-bot:
[operations/mediawiki-config@master] GrowthExperiments: enable AddLink backend for swwiki
Mentioned in SAL (#wikimedia-operations) [2023-08-31T14:09:02Z] <sgimeno@deploy1002> Started scap: Backport for [[gerrit:954004|GrowthExperiments: enable AddLink backend for swwiki (T308138 T308139)]]
Mentioned in SAL (#wikimedia-operations) [2023-08-31T14:10:43Z] <sgimeno@deploy1002> sgimeno: Backport for [[gerrit:954004|GrowthExperiments: enable AddLink backend for swwiki (T308138 T308139)]] synced to the testservers mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
Mentioned in SAL (#wikimedia-operations) [2023-08-31T14:16:37Z] <sgimeno@deploy1002> Finished scap: Backport for [[gerrit:954004|GrowthExperiments: enable AddLink backend for swwiki (T308138 T308139)]] (duration: 07m 34s)
Change 960074 had a related patch set uploaded (by Sergio Gimeno; author: Sergio Gimeno):
[operations/mediawiki-config@master] GrowthExperiments: enable AddLink backend 14th round of wikis
I ran this script for adding the link-recommendation task type and populating the excluded sections entries:
PHAB=T308139 for WIKI in wawiki warwiki wowiki xalwiki xhwiki xmfwiki yiwiki yowiki zawiki zeawiki zh_min_nanwiki zuwiki; do ORIGIN=`mwscript getConfiguration.php $WIKI --settings 'wgCanonicalServer' --format json | jq --raw-output '.wgCanonicalServer'` mwscript extensions/GrowthExperiments/maintenance/changeWikiConfig.php $WIKI \ --page MediaWiki:NewcomerTasks.json \ --create-only \ --json \ --summary "Growth features configuration boilerplate ([[phab:$PHAB]])" \ link-recommendation \ '{ "type": "link-recommendation", "group": "easy" }' jq "select(.wiki==\"$WIKI\" and .probability > 0.25) | .section" wiki_sections.jsonl \ | jq --slurp --compact-output "unique" \ | mwscript extensions/GrowthExperiments/maintenance/changeWikiConfig.php $WIKI \ --page MediaWiki:NewcomerTasks.json \ --json \ --summary "machine-generated configuration for excluding sections from link recommendations ([[phab:$PHAB]]), feel free to improve" \ link-recommendation.excludedSections \ "`cat`" echo "$ORIGIN/wiki/MediaWiki:NewcomerTasks.json" echo "$ORIGIN/w/index.php?title=MediaWiki:NewcomerTasks.json&diff=next" echo "Press <Enter> to continue" read # give time for manual verification done
Note that the script didn't populate excludedSections for zawiki and zh_min_nanwiki because these were not present in the wiki_sections.jsonl, see T345562.
Change 960074 merged by jenkins-bot:
[operations/mediawiki-config@master] GrowthExperiments: enable AddLink backend 14th round of wikis
Mentioned in SAL (#wikimedia-operations) [2023-09-25T13:01:47Z] <urbanecm@deploy2002> Started scap: Backport for [[gerrit:960074|GrowthExperiments: enable AddLink backend 14th round of wikis (T308139)]]
Mentioned in SAL (#wikimedia-operations) [2023-09-25T13:14:23Z] <urbanecm@deploy2002> urbanecm and sgimeno: Backport for [[gerrit:960074|GrowthExperiments: enable AddLink backend 14th round of wikis (T308139)]] synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
Mentioned in SAL (#wikimedia-operations) [2023-09-25T13:25:15Z] <urbanecm@deploy2002> Finished scap: Backport for [[gerrit:960074|GrowthExperiments: enable AddLink backend 14th round of wikis (T308139)]] (duration: 23m 28s)
I've checked the enabled wikis and all present a fair amount of results except for:
- xalwiki returns 5 results
- xmfwiki returns 3 results
- xhwiki returns 0 results
@kevinbazira do you have any clues on why the model produces few results in the mentioned wikis?
I think we can go ahead and enable the frontend in all wikis but xhwiki. Even with a few number of results in xalwiki and xmfwiki we expect the script to keep generating some suggestions. How does that sound @Trizek-WMF? What's the phab ticket for tracking wikis with AddLink suggestions issues, I can't find it under the scaling epic 🤔
@Sgs the models produce few results in the mentioned wikis because those languages have few articles (as shown below) and those articles possibly have fewer links for the models to learn and make predictions from.
- xalwiki has about 2300 articles: https://stats.wikimedia.org/#/xal.wikipedia.org/content/pages-to-date/normal|line|2-year|page_type~content|monthly
- xmfwiki has about 20543 articles: https://stats.wikimedia.org/#/xmf.wikipedia.org/content/pages-to-date/normal|line|2-year|page_type~content|monthly
- xhwiki has about 1671 articles: https://stats.wikimedia.org/#/xh.wikipedia.org/content/pages-to-date/normal|line|2-year|page_type~content|monthly
CCing @MGerlach, in case he would like to add comments on the above.
What's the phab ticket for tracking wikis with AddLink suggestions issues, I can't find it under the scaling epic 🤔
The tracking for add-a-link models that had issues and were not published was happening in this task: T309263. We might not want to add these languages to that task though, since they passed the evaluation (T308139#8723643) and got published (T308139#8728522). As more articles with links in these languages become available, the models are expected to perform better after the next training round (T336927#8864508).
Change 964929 had a related patch set uploaded (by Sergio Gimeno; author: Sergio Gimeno):
[operations/mediawiki-config@master] GrowthExperiments: enable AddLink frontend 14th round of wikis
Change 964929 merged by jenkins-bot:
[operations/mediawiki-config@master] GrowthExperiments: enable AddLink frontend 14th round of wikis
Mentioned in SAL (#wikimedia-operations) [2023-10-11T07:15:43Z] <sgimeno@deploy2002> Started scap: Backport for [[gerrit:964929|GrowthExperiments: enable AddLink frontend 14th round of wikis (T308139)]]
Mentioned in SAL (#wikimedia-operations) [2023-10-11T07:17:10Z] <sgimeno@deploy2002> sgimeno: Backport for [[gerrit:964929|GrowthExperiments: enable AddLink frontend 14th round of wikis (T308139)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
Mentioned in SAL (#wikimedia-operations) [2023-10-11T07:24:48Z] <sgimeno@deploy2002> Finished scap: Backport for [[gerrit:964929|GrowthExperiments: enable AddLink frontend 14th round of wikis (T308139)]] (duration: 09m 05s)
Selectively checked some wikis from the list:
xalwiki has only 6 suggested articles; no suggested edits have been made so far
xhwiki has 98 suggested articles; no suggested edits have been made so far
zawiki has 52 suggested articles; no suggested edits have been made so far
Moving to Test in Production|Watching for monitoring and for possible feedback.