- Training models
- Catalan Wikipedia
- Hebrew Wikipedia
- Hindi Wikipedia
- Korean Wikipedia
- Norwegian Bokmål Wikipedia
- Portuguese Wikipedia
- Simple English Wikipedia
- Swedish Wikipedia
- Ukrainian Wikipedia
- Models verification
- Publish Datasets
- Populate the excluded section titles
- Deploy back-end (on May 5th at 13h UTC)
- [This task only] Notes on throughput and how long "Deploy back-end" took so that we can decide whether to make improvements for future rounds (T304953)
- Check how the model works on the wikis
- In Search, use hasrecommendation:link to find articles
- Test them on https://api.wikimedia.org/service/linkrecommendation/apidocs/#/default/get_v1_linkrecommendations__project___domain___page_title_
- Inform communities
- Deploy front-end (May 18)
Description
Details
- Due Date
- May 18 2022, 4:00 PM
- Other Assignee
- Tgr
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Open | • lbowmaker | T307881 Scaling of link suggestions service | |||
Open | Trizek-WMF | T304110 [EPIC] Deploy "add a link" to all Wikipedias | |||
Resolved | Trizek-WMF | T304542 Deploy "add a link" to third round of wikis |
Event Timeline
Training models for the wikis listed has been completed successfully.
We have also worked on models verification using the backtesting results shown below:
Precision@0.5 | Recall@0.5 | |
cawiki | 0.85 | 0.51 |
hewiki | 0.75 | 0.28 |
hiwiki | 0.76 | 0.27 |
kowiki | 0.72 | 0.25 |
nowiki | 0.84 | 0.54 |
ptwiki | 0.85 | 0.48 |
simplewiki | 0.79 | 0.44 |
svwiki | 0.9 | 0.61 |
ukwiki | 0.8 | 0.42 |
CCing @MGerlach, in case he'd like to add comments on the backtesting evaluation.
The numbers from the backtesting do not raise any red flags for me. They are comparable to what we observed in the second round ( T284481#7163025).
- precision is above 75% except kowiki which falls slightly below at 72%; as a comparison in the second round bnwiki had the lowest precision with 73%.
- recall is above 40% except hewiki, hiwiki, kowiki at 25-28%; as a comparison in the second round bnwiki had the lowest recall with 28%. a low number here is an indicator that we might run into problems generating enough recommendations. however, given it worked for bnwiki so far, I think this is ok.
@kostajh, we completed generating models and datasets for the third round of wikis (listed in the task description) and shared the models' evaluation above, would you like us to continue with publishing the datasets?
Yes, please go ahead with publishing the datasets. Once the service imports the datasets, we should be able to do some tests using https://api.wikimedia.org/service/linkrecommendation/apidocs/
I added the back end deployment and the verification of the models on each wikis to the task description. Plus a 1-2 weeks verification time as Kosta suggested.
I removed the due date because of Growth team's packed schedule. We will soon have a proper calendar for the next steps: T304953: Schedule the deployment of "Add a link" to more wikis.
Keeping here the announcement for Tech News:
* <translate>Starting on Wednesday, a new set of Wikipedias will get "[[<tvar name="1">mw:Special:MyLanguage/Help:Growth/Tools/Add a link</tvar>|Add a link]]" (<tvar name="2">{{int:project-localized-name-cawiki}}, {{int:project-localized-name-hewiki}}, {{int:project-localized-name-hiwiki}}, {{int:project-localized-name-kowiki}}, {{int:project-localized-name-nowiki}}, {{int:project-localized-name-ptwiki}}, {{int:project-localized-name-simplewiki}}, {{int:project-localized-name-svwiki}}, {{int:project-localized-name-ukwiki}}</tvar>). This is part of the progressive deployment of this tool [<tvar name="3">https://phabricator.wikimedia.org/T304110</tvar> to more Wikipedias]. The communities can [[<tvar name="4">mw:Special:MyLanguage/Growth/Community configuration</tvar>|configure how this feature works locally]].</translate> [https://phabricator.wikimedia.org/T304542]
I changed it to even harder to understand wikitext, however with more correct output:
* <translate>Starting on Wednesday, a new set of Wikipedias will get "[[<tvar name="1">mw:Special:MyLanguage/Help:Growth/Tools/Add a link</tvar>|Add a link]]" (<tvar name="2">{{int:project-localized-name-cawiki/{{TRANSLATIONLANGUAGE}}}}{{int:comma-separator/{{TRANSLATIONLANGUAGE}}}}{{int:project-localized-name-hewiki/{{TRANSLATIONLANGUAGE}}}}{{int:comma-separator/{{TRANSLATIONLANGUAGE}}}}{{int:project-localized-name-hiwiki/{{TRANSLATIONLANGUAGE}}}}{{int:comma-separator/{{TRANSLATIONLANGUAGE}}}}{{int:project-localized-name-kowiki/{{TRANSLATIONLANGUAGE}}}}{{int:comma-separator/{{TRANSLATIONLANGUAGE}}}}{{int:project-localized-name-nowiki/{{TRANSLATIONLANGUAGE}}}}{{int:comma-separator/{{TRANSLATIONLANGUAGE}}}}{{int:project-localized-name-ptwiki/{{TRANSLATIONLANGUAGE}}}}{{int:comma-separator/{{TRANSLATIONLANGUAGE}}}}{{int:project-localized-name-simplewiki/{{TRANSLATIONLANGUAGE}}}}{{int:comma-separator/{{TRANSLATIONLANGUAGE}}}}{{int:project-localized-name-svwiki/{{TRANSLATIONLANGUAGE}}}}{{int:comma-separator/{{TRANSLATIONLANGUAGE}}}}{{int:project-localized-name-ukwiki/{{TRANSLATIONLANGUAGE}}}}</tvar>). This is part of the [[<tvar name="3">phab:T304110</tvar>|progressive deployment of this tool to more Wikipedias]]. The communities can [[<tvar name="4">mw:Special:MyLanguage/Growth/Community configuration</tvar>|configure how this feature works locally]].</translate> [https://phabricator.wikimedia.org/T304542]
@kostajh, thank you for the confirmation. We have published the datasets for the 9 wikis listed in the description.
Looks like all of the datasets have imported according to the output from https://api.wikimedia.org/service/linkrecommendation/v1/linkrecommendations/wikipedia/zz/Cat?threshold=0.5&max_recommendations=15, and a test query with uk and ca wikis.
Change 789556 had a related patch set uploaded (by Gergő Tisza; author: Gergő Tisza):
[operations/mediawiki-config@master] GrothExperiments: Enable Add Link backend on tier 3 wikis
Configuration has been updated with machine-generated section exclusion data, see T304150: Allow communities to configure which sections are excluded from link suggestion generation and T306792: initWikiConfig should set excludedSections for link-recommendation task type.
Change 789556 merged by jenkins-bot:
[operations/mediawiki-config@master] GrothExperiments: Enable Add Link backend on tier 3 wikis
Mentioned in SAL (#wikimedia-operations) [2022-05-05T13:06:53Z] <tgr@deploy1002> Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:789556|GrothExperiments: Enable Add Link backend on tier 3 wikis (T304542)]] (duration: 00m 49s)
Mentioned in SAL (#wikimedia-operations) [2022-05-05T13:25:36Z] <tgr@deploy1002> Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:789556|GrothExperiments: Enable Add Link backend on tier 3 wikis (T304542)]] (again, used the wrong directory before) (duration: 00m 48s)
The stat to watch for assessing task generation throughput: https://grafana.wikimedia.org/d/vGq7hbnMz/special-homepage-and-suggested-edits?orgId=1&from=now-14d&to=now&viewPanel=31
All the task pools were filled by about 14h UTC on the 7th, so in total it took about two days, which is ~5.3 hours per wiki. All ended up with decent pool sizes- 18K for simplewiki, 21-22K for the rest.
Searching for hasrecommendation:link returns no result at the following wikis:
- Hindi Wikipedia
- Ukrainian Wikipedia
We can deploy to the listed wikis, except the two missing ones.
These two can be removed from this list if needed, or moved to T304548: Deploy "add a link" to 4th round of wikis, as I haven't informed these communities yet.
The deployment is announced in Tech News, "starting on Wednesday" (May 18), hence it can be done anytime after this date. We can change the list of wikis written in Tech News until the distribution starts, on Monday afternoon UTC.
Anecdotally, running hasrecommendation:link at all wikis listed on this task returned often the same article topics at Special:Search. I saw several times the articles about the French National Library, ISBN, Encyclopedia of life, Wikipedia... :)
Mentioned in SAL (#wikimedia-operations) [2022-05-11T20:28:19Z] <tgr> T304542 running mwscript extensions/GrowthExperiments/maintenance/refreshLinkRecommendations.php hiwiki --verbose
For hiwiki/ukwiki, all mwaddlink service requests fail with There was a problem during the HTTP request: 400 Bad Request.
A manually reproduced example gives Request Line is too large (4228 > 4094).
I guess we should move sections_to_exclude to the POST body.
We are going through the topics in the order defined in code, so early topics might be a bit overrepresented, but otherwise article selection is random. When you use Special:Search, articles are sorted in part by the number of incoming links, which would explain the popularity of things like ISBN. (You can add &sort=random to get a proper random sampling.)
Will it be ready for next week? If not, it is not a big deal: we can move the two wikis impacted to the next round.
Got it, thank you for the details! :)
I tested Ukrainian and Hindi Wikipedia, and I now see some results. Thank you for fixing it, @Tgr!
What happen when the list is empty?
We may have this issue for other rounds, as they would have less pages.
It's in https://hi.wikipedia.org/wiki/Special:NewcomerTasksInfo (same as other suggested edits); you need to look for link-recommendation. It only shows up after the task pool is at least partially loaded.
No structured add a link tasks will be suggested to the users. Depending on the user's task type filters, this can mean only non-structured tasks are suggested, or that no tasks are available. For the small wikis our features are on, "no tasks" or "not enough tasks" is probably already the case, so at the very least, this won't cause us any trouble we wouldn't have w/o structured add a link deployed.
You can check all wikis (in a somewhat hard-to-read format) at https://grafana.wikimedia.org/d/vGq7hbnMz/special-homepage-and-suggested-edits?viewPanel=31 .
Change 793395 had a related patch set uploaded (by Gergő Tisza; author: Gergő Tisza):
[operations/mediawiki-config@master] GrothExperiments: Enable Add Link frontend on tier 3 wikis
Change 793395 merged by jenkins-bot:
[operations/mediawiki-config@master] GrothExperiments: Enable Add Link frontend on tier 3 wikis
Mentioned in SAL (#wikimedia-operations) [2022-05-19T14:36:48Z] <tgr@deploy1002> Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:793395|GrothExperiments: Enable Add Link frontend on tier 3 wikis (T304542)]] (duration: 00m 50s)