Page MenuHomePhabricator

Retrain Add Link models for Surfacing Structured Tasks pilot wikis
Closed, ResolvedPublic

Description

The Growth-Team plans to release a new Add Link related feature soon: T385343: Surfacing "Add a link" Structured Tasks: Experiment Release (FY24/25 WE1.2.9). This would be targeted to 6 pilot wikis (see the other task). The Add Link model for those wikis was trained a fairly long time ago:

WikiTrained onNew version publishedRevalidation finished
eswiki2021-06-23
frwiki2021-03-01
arzwiki2022-05-24
ruwiki2021-06-23
ptwiki2021-03-31
fawiki2021-06-23
idwiki2022-07-18

For all wikis, the model is over 2.5 years old (while most wikis have nearly a 4 year old model). This is not great, as the model is based on the article content when it was trained. In addition to that, it is also based on article existence when the model was trained (in other words, we are never suggesting a link to an article that didn't exist ~four years ago). Since articles are created all the time, and the norms at wikis change from time to time, we should be using reasonably recent models, to provide the best quality recommendations possible.

Since we are doing an Add Link experiment, it makes sense to me to retrain the models, and see if there is any difference in the recommendation quality. That would allow us to decide whether we want to schedule retraining all Add Link models (for all wikis).

Event Timeline

Restricted Application added subscribers: Huji, Aklapper. · View Herald Transcript
KStoller-WMF renamed this task from Consider retraining Add Link models before deploying Surfacing Structured Tasks to Retrain Add Link models before deploying Surfacing Structured Tasks.Feb 19 2025, 5:19 PM

@Sgs, @Michael and I discussed this on Wednesday and found no technical problems with going ahead with this. @KStoller-WMF and I discussed this as well, and we found no product problems with going ahead with this. As such, I went ahead and published the models I have trained already.

I tried revalidating frwiki's suggestions, and I failed:

[urbanecm@deploy2002 ~]$ cat frwiki-checksum.txt 
0c8608674ef53f41d491ae2918bb444df90b8c37ad94675416847b54c753e1c7
[urbanecm@deploy2002 ~]$ mwscript-k8s --file frwiki-checksum.txt -f -- GrowthExperiments:revalidateLinkRecommendations.php --wiki=frwiki --exceptDatasetChecksums=frwiki-checksum.txt --deleteNullRecommendations
⏳ Starting GrowthExperiments:revalidateLinkRecommendations.php on Kubernetes as job mw-script.codfw.fs06mfc3 ...
⏳ Waiting for the container to start...
🚀 Job is running.
📜 Streaming logs:
Revalidating link recommendations:
  fetching task batch starting with page 0
[55cc8dd764be2c0fcd4b880c] [no req]   Error: Typed property GrowthExperiments\Maintenance\RevalidateLinkRecommendations::$allowedChecksums must not be accessed before initialization
Backtrace:
from /srv/mediawiki/php-1.44.0-wmf.17/extensions/GrowthExperiments/maintenance/revalidateLinkRecommendations.php(250)
#0 /srv/mediawiki/php-1.44.0-wmf.17/extensions/GrowthExperiments/maintenance/revalidateLinkRecommendations.php(185): GrowthExperiments\Maintenance\RevalidateLinkRecommendations->getAllowedChecksums()
#1 /srv/mediawiki/php-1.44.0-wmf.17/extensions/GrowthExperiments/maintenance/revalidateLinkRecommendations.php(137): GrowthExperiments\Maintenance\RevalidateLinkRecommendations->validateRecommendation(GrowthExperiments\NewcomerTasks\AddLink\LinkRecommendation)
#2 /srv/mediawiki/php-1.44.0-wmf.17/maintenance/includes/MaintenanceRunner.php(695): GrowthExperiments\Maintenance\RevalidateLinkRecommendations->execute()
#3 /srv/mediawiki/php-1.44.0-wmf.17/maintenance/run.php(51): MediaWiki\Maintenance\MaintenanceRunner->run()
#4 /srv/mediawiki/multiversion/MWScript.php(156): require_once(string)
#5 {main}
[urbanecm@deploy2002 ~]$

Filled T387001, will try again on Monday assuming the patch is approved in time.

Mentioned in SAL (#wikimedia-operations) [2025-02-24T09:32:13Z] <urbanecm> Start GrowthExperiments:revalidateLinkRecommendations.php for frwiki, eswiki, ptwiki and idwiki (T385780)

Started arzwiki revalidation:

[urbanecm@deploy2002 ~]$ curl -s 'https://analytics.wikimedia.org/published/datasets/one-off/research-mwaddlink/arzwiki/arzwiki.linkmodel.json.checksum'
2f4ab8aa4ff47a8b859f23b21adc2775bab0337a6f71a96269836e7c7d13af33  arzwiki.linkmodel.json
[urbanecm@deploy2002 ~]$ cat > arzwiki-checksum.txt
2f4ab8aa4ff47a8b859f23b21adc2775bab0337a6f71a96269836e7c7d13af33
[urbanecm@deploy2002 ~]$ mwscript-k8s --file arzwiki-checksum.txt -f -- GrowthExperiments:revalidateLinkRecommendations --wiki=arzwiki --exceptDatasetChecksums=arzwiki-checksum.txt --deleteNullRecommendations --verbose
⏳ Starting GrowthExperiments:revalidateLinkRecommendations on Kubernetes as job mw-script.codfw.tid05s4u ...
⏳ Waiting for the container to start...
🚀 Job is running.
📜 Streaming logs:
Revalidating link recommendations:
  fetching task batch starting with page 0
  منه شلبى is outdated, regenerating... success
  منى هلا is outdated, regenerating... success
  نانسى عجرم is outdated, regenerating... success
  نمر is outdated, regenerating... success
[...]

If someone wants, logs should be streamable using: K8S_CLUSTER=codfw KUBECONFIG=/etc/kubernetes/mw-script-codfw.config kubectl logs -f job/mw-script.codfw.tid05s4u mediawiki-tid05s4u-app from the deployment server (or K8S_CLUSTER=codfw KUBECONFIG=/etc/kubernetes/mw-script-restricted-codfw.config kubectl logs -f job/mw-script.codfw.tid05s4u mediawiki-tid05s4u-app if you are a member of the restricted group).

KStoller-WMF renamed this task from Retrain Add Link models before deploying Surfacing Structured Tasks to Retrain Add Link models for Surfacing Structured Tasks pilot wikis.Mar 9 2025, 9:40 PM
KStoller-WMF moved this task from Inbox to Backlog on the Growth-Team board.
Michael claimed this task.

I'm closing this as resolved since the experiment has concluded and all related code has been removed. (the models are still there ofcourse.)