Page MenuHomePhabricator

Verify the Add Link training pipeline works as expected
Closed, ResolvedPublic

Description

In the parent task, I proposed to retrain Add Link model for the Surfacing Add Link pilot wikis. To be able to do that, we need to figure out whether the training pipeline works as intended, given it was not used recently. To do that, I decided to retrain Add Link model for frwiki (and potentially one other wiki), to be able to regain experience with the process and identify any issues that might've appeared in the last few years.

Event Timeline

The training began during Tuesday EU afternoon. I run into several issues (which I will upload patches for here), so there were several pauses (until I saw the error and addressed it somehow), but the last step of the pipeline successfully finished several minutes ago. The data should appear on https://analytics.wikimedia.org/published/datasets/one-off/urbanecm/frwiki-updated-add-link-T385781/ within the next half an hour or so.

Backtesting evaluation output:

1(venv) [urbanecm@stat1008 ~/Documents/mwaddlink/src/scripts (main *% u=)]$ python generate_backtesting_eval.py -id frwiki -nmax 10000
2threshold: 0.0
3finished: 10000 sentences
4micro_precision: 0.5369388972730145
5micro_recall: 0.5432111915921034
6----------------------
7threshold: 0.1
8finished: 10000 sentences
9micro_precision: 0.6190608062665665
10micro_recall: 0.5556028973157222
11----------------------
12threshold: 0.2
13finished: 10000 sentences
14micro_precision: 0.6739393415708977
15micro_recall: 0.553863087629598
16----------------------
17threshold: 0.3
18finished: 10000 sentences
19micro_precision: 0.7204857751112831
20micro_recall: 0.5287246129811106
21----------------------
22threshold: 0.4
23finished: 10000 sentences
24micro_precision: 0.7606050641236436
25micro_recall: 0.4927567106945036
26----------------------
27threshold: 0.5
28finished: 10000 sentences
29micro_precision: 0.8024929324081213
30micro_recall: 0.4434739383610283
31----------------------
32threshold: 0.6
33finished: 10000 sentences
34micro_precision: 0.8393437548367125
35micro_recall: 0.38510154807555746
36----------------------
37threshold: 0.7
38finished: 10000 sentences
39micro_precision: 0.8774601918702404
40micro_recall: 0.31501207214884247
41----------------------
42threshold: 0.8
43finished: 10000 sentences
44micro_precision: 0.9208775654635527
45micro_recall: 0.2310041187331345
46----------------------
47threshold: 0.9
48finished: 10000 sentences
49micro_precision: 0.9489603024574669
50micro_recall: 0.12476920891918762
51----------------------
52(venv) [urbanecm@stat1008 ~/Documents/mwaddlink/src/scripts (fix %)]$

I'll need to take a look at the numbers and evaluate them. The results from the original training are at https://meta.wikimedia.org/wiki/Research:Improving_multilingual_support_for_link_recommendation_model_for_add-a-link_task/Results_round-1.

Change #1117632 had a related patch set uploaded (by Urbanecm; author: Urbanecm):

[research/mwaddlink@main] run-pipeline: Update DB_HOST to dbstore1009

https://gerrit.wikimedia.org/r/1117632

Change #1117637 had a related patch set uploaded (by Urbanecm; author: Urbanecm):

[research/mwaddlink@main] Update .gitignore

https://gerrit.wikimedia.org/r/1117637

Change #1117636 had a related patch set uploaded (by Urbanecm; author: Urbanecm):

[research/mwaddlink@main] requirements: Make repository installable

https://gerrit.wikimedia.org/r/1117636

Change #1117637 merged by jenkins-bot:

[research/mwaddlink@main] Update .gitignore

https://gerrit.wikimedia.org/r/1117637

Backtesting evaluation output:

1(venv) [urbanecm@stat1008 ~/Documents/mwaddlink/src/scripts (main *% u=)]$ python generate_backtesting_eval.py -id frwiki -nmax 10000
2threshold: 0.0
3finished: 10000 sentences
4micro_precision: 0.5369388972730145
5micro_recall: 0.5432111915921034
6----------------------
7threshold: 0.1
8finished: 10000 sentences
9micro_precision: 0.6190608062665665
10micro_recall: 0.5556028973157222
11----------------------
12threshold: 0.2
13finished: 10000 sentences
14micro_precision: 0.6739393415708977
15micro_recall: 0.553863087629598
16----------------------
17threshold: 0.3
18finished: 10000 sentences
19micro_precision: 0.7204857751112831
20micro_recall: 0.5287246129811106
21----------------------
22threshold: 0.4
23finished: 10000 sentences
24micro_precision: 0.7606050641236436
25micro_recall: 0.4927567106945036
26----------------------
27threshold: 0.5
28finished: 10000 sentences
29micro_precision: 0.8024929324081213
30micro_recall: 0.4434739383610283
31----------------------
32threshold: 0.6
33finished: 10000 sentences
34micro_precision: 0.8393437548367125
35micro_recall: 0.38510154807555746
36----------------------
37threshold: 0.7
38finished: 10000 sentences
39micro_precision: 0.8774601918702404
40micro_recall: 0.31501207214884247
41----------------------
42threshold: 0.8
43finished: 10000 sentences
44micro_precision: 0.9208775654635527
45micro_recall: 0.2310041187331345
46----------------------
47threshold: 0.9
48finished: 10000 sentences
49micro_precision: 0.9489603024574669
50micro_recall: 0.12476920891918762
51----------------------
52(venv) [urbanecm@stat1008 ~/Documents/mwaddlink/src/scripts (fix %)]$

I'll need to take a look at the numbers and evaluate them. The results from the original training are at https://meta.wikimedia.org/wiki/Research:Improving_multilingual_support_for_link_recommendation_model_for_add-a-link_task/Results_round-1.

The performance of the updated model is comparable to the previous version of the model, so looking good from my side.
We tracked precision and recall at threshold 0.5 for all wikis here.

  • Previous version: Prec=0.815, Recall=0.459
  • Updated version: Prec=0.803, Recall=0.443

Pipeline works with the patches I uploaded. Moving this to Code Review.

Change #1117632 merged by jenkins-bot:

[research/mwaddlink@main] run-pipeline: Update DB_HOST

https://gerrit.wikimedia.org/r/1117632

Change #1117636 merged by jenkins-bot:

[research/mwaddlink@main] requirements: Make repository installable

https://gerrit.wikimedia.org/r/1117636

Change #1121452 had a related patch set uploaded (by Urbanecm; author: Urbanecm):

[research/mwaddlink@main] README: Mention Python 3.9 instead of 3.7

https://gerrit.wikimedia.org/r/1121452

Change #1121453 had a related patch set uploaded (by Urbanecm; author: Urbanecm):

[research/mwaddlink@main] Update dependencies to make repo installable

https://gerrit.wikimedia.org/r/1121453

Change #1121452 merged by jenkins-bot:

[research/mwaddlink@main] README: Mention Python 3.9 instead of 3.7

https://gerrit.wikimedia.org/r/1121452

Change #1121453 merged by jenkins-bot:

[research/mwaddlink@main] Update dependencies to make repo installable

https://gerrit.wikimedia.org/r/1121453

Change #1122967 had a related patch set uploaded (by Sergio Gimeno; author: Sergio Gimeno):

[operations/deployment-charts@master] linkrecommendation: Bump version

https://gerrit.wikimedia.org/r/1122967

Change #1122967 merged by jenkins-bot:

[operations/deployment-charts@master] linkrecommendation: Bump version

https://gerrit.wikimedia.org/r/1122967

The pipeline mostly works. I wasn't able to run it for fawiki, but other wikis work. I filled T387556: Add Link training pipeline gets stuck for certain wikis for the fawiki-specific issue. Closing this one, as the general round of verification is now done.