In the parent task, I proposed to retrain Add Link model for the Surfacing Add Link pilot wikis. To be able to do that, we need to figure out whether the training pipeline works as intended, given it was not used recently. To do that, I decided to retrain Add Link model for frwiki (and potentially one other wiki), to be able to regain experience with the process and identify any issues that might've appeared in the last few years.
Description
Details
| Status | Subtype | Assigned | Task | ||
|---|---|---|---|---|---|
| Resolved | KStoller-WMF | T368187 [EPIC] Constructive activation experimentation (Growth work related to Wiki Experiences 1.2) | |||
| Resolved | Michael | T385780 Retrain Add Link models for Surfacing Structured Tasks pilot wikis | |||
| Resolved | Urbanecm_WMF | T385781 Verify the Add Link training pipeline works as expected |
Event Timeline
The training began during Tuesday EU afternoon. I run into several issues (which I will upload patches for here), so there were several pauses (until I saw the error and addressed it somehow), but the last step of the pipeline successfully finished several minutes ago. The data should appear on https://analytics.wikimedia.org/published/datasets/one-off/urbanecm/frwiki-updated-add-link-T385781/ within the next half an hour or so.
Backtesting evaluation output:
I'll need to take a look at the numbers and evaluate them. The results from the original training are at https://meta.wikimedia.org/wiki/Research:Improving_multilingual_support_for_link_recommendation_model_for_add-a-link_task/Results_round-1.
Change #1117632 had a related patch set uploaded (by Urbanecm; author: Urbanecm):
[research/mwaddlink@main] run-pipeline: Update DB_HOST to dbstore1009
Change #1117637 had a related patch set uploaded (by Urbanecm; author: Urbanecm):
[research/mwaddlink@main] Update .gitignore
Change #1117636 had a related patch set uploaded (by Urbanecm; author: Urbanecm):
[research/mwaddlink@main] requirements: Make repository installable
The performance of the updated model is comparable to the previous version of the model, so looking good from my side.
We tracked precision and recall at threshold 0.5 for all wikis here.
- Previous version: Prec=0.815, Recall=0.459
- Updated version: Prec=0.803, Recall=0.443
Change #1117632 merged by jenkins-bot:
[research/mwaddlink@main] run-pipeline: Update DB_HOST
Change #1117636 merged by jenkins-bot:
[research/mwaddlink@main] requirements: Make repository installable
Change #1121452 had a related patch set uploaded (by Urbanecm; author: Urbanecm):
[research/mwaddlink@main] README: Mention Python 3.9 instead of 3.7
Change #1121453 had a related patch set uploaded (by Urbanecm; author: Urbanecm):
[research/mwaddlink@main] Update dependencies to make repo installable
Change #1121452 merged by jenkins-bot:
[research/mwaddlink@main] README: Mention Python 3.9 instead of 3.7
Change #1121453 merged by jenkins-bot:
[research/mwaddlink@main] Update dependencies to make repo installable
Change #1122967 had a related patch set uploaded (by Sergio Gimeno; author: Sergio Gimeno):
[operations/deployment-charts@master] linkrecommendation: Bump version
Change #1122967 merged by jenkins-bot:
[operations/deployment-charts@master] linkrecommendation: Bump version
The pipeline mostly works. I wasn't able to run it for fawiki, but other wikis work. I filled T387556: Add Link training pipeline gets stuck for certain wikis for the fawiki-specific issue. Closing this one, as the general round of verification is now done.