Page MenuHomePhabricator

[EPIC] Deploy "add a link" to all Wikipedias
Open, In Progress, MediumPublic

Description

After having "add a link" on about 10 wikis for several months, we learned about valuable improvements to make. Those improvements are collected in this epic: T300851: [EPIC] Growth: "add a link" structured task 2.0. Once those improvements are complete, we will be comfortable deploying "add a link" more broadly. This task is about generating the suggestions for another set of wikis and deploying there.

This process was started in T290011: [OLD] Deploy Add a link to a third round of wikis, which is kept under this task for reference.


CRS involvement for reach batch:

  1. Test the models
  2. Based on results, advise and decide if a wiki should get the model
  3. Inform communities (mainly through Tech News).

Related Objects

StatusSubtypeAssignedTask
Openlbowmaker
In ProgressTrizek-WMF
DeclinedNone
ResolvedTrizek-WMF
ResolvedTgr
ResolvedTrizek-WMF
ResolvedTgr
ResolvedEtonkovidova
Resolvedkevinbazira
Resolvedkevinbazira
Resolvedkostajh
ResolvedSgs
ResolvedSgs
InvalidTrizek-WMF
ResolvedSgs
Openkevinbazira
ResolvedSgs
ResolvedNone
ResolvedUrbanecm_WMF
Resolvedkevinbazira
ResolvedSgs
Invalidkevinbazira
ResolvedSgs
ResolvedSgs
ResolvedSgs
ResolvedSgs
ResolvedSgs
OpenNone
OpenTrizek-WMF
ResolvedSgs
OpenNone
Opencalbon
Resolvedkevinbazira
Resolvedkevinbazira
Resolvedkevinbazira
ResolvedAKhatun_WMF
Resolvedkevinbazira
Resolvedkevinbazira

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Thank you @kostajh. The wikis already ready are listed on T304542: Deploy "add a link" to third round of wikis. I'm editing the task accordingly.

We can announce a deployment the week after the training is done, and for announcement purposes, on a Wednesday (time to have Tech News reaching at all wikis, and enough time before Friday).

I would adjust the timing – after the backend patch is enabled, we need to wait some period of time (days, maybe 1-2 weeks) to see that there are enough link recommendation tasks for the wiki (see the task pool section of https://grafana.wikimedia.org/d/vGq7hbnMz/special-homepage-and-suggested-edits?orgId=1). Once we see there are enough tasks, then we can enable the frontend.

So it would be a deployment around April 15, on a Wednesday.

Maybe, with everything happening in Growth-Team (Sprint 0 (Growth Team)) + various people being away around spring holidays, I am not confident on that date.

@kevinbazira, all rounds have been created, they are sub-tasks of the current one. Please proceed on model training as you can! :)

Trizek-WMF changed the task status from Open to In Progress.May 11 2022, 5:44 PM

Also @kevinbazira, I tried to make rounds that gather approximatly the same number of wikis, with a lot of small ones plus a big one, or a few mid-sized ones with some small wikis. Let me know if I should change the number of wikis there to help your work building the models (make bigger batches, or smaller ones).

@Trizek-WMF, thank you for creating all the rounds. I am working on generating datasets and models round by round and will be sharing updates on the sub-tasks.

The task generation for the Add Link task pool runs on one thread per DB server group (s1-s7). The two large DB server groups (in terms of number of wikis) are s5 (21 Wikipedias) and s3 (280 Wikipedias). For mid-sized wikis task generation took about 5-6 hours for a single wiki. It will probably take the same time for large wikis as well (we are going for a constant number of wikis, regardless of size); might take less time for small wikis where candidates can be exhausted before the required number of tasks are found. But assuming it doesn't, 280 Wikipedias is ~70 days, so we should probably run 4 instances of the refresh script in parallel on s3.

The script creates a lock with the ID of the wiki it is processing, so running multiple instances in parallel on the same dblist should be fine - the second thread will just skip the wiki that the first is already processing.

Proposal for streamlining the completion of remaining rounds:

  1. Train models, verify models, publish datasets for all remaining wikis. This involves work from Machine-Learning-Team (cc @kevinbazira) and Research (cc @MGerlach). Growth doesn't need to be consulted on this phase; once these teams are happy with the models, the datasets can be published.
  2. Growth engineers will populate the excluded section titles for all remaining wikis
  3. Growth engineers will enable the backend for all remaining wikis, so the task pools begin to fill up
  4. @Trizek-WMF can then verify the hasrecommendation:link results and check the API https://api.wikimedia.org/service/linkrecommendation/apidocs/#/default/get_v1_linkrecommendations__project___domain___page_title_
  5. @Trizek-WMF can inform communities, and Growth engineers can enable the front-end, either staggered or en masse, depending on what works better for @Trizek-WMF

tl;dr I think we would all save time with context switching if we can do the model training, section title population, and backend enabling all at once, rather than in phases over a period of months. Then the actual presentation to communities could be done in a more staggered way if that is better from a community relations standpoint.

@KStoller-WMF @kevinbazira @MGerlach what do you think?

@kostajh +1 on doing the initial phases at once to avoid context switching.

The ML team will proceed with training models, evaluating them, and publishing datasets for all the remaining rounds.

As far as we keep a staggered deployment, I'm fine. :)

+1
This sounds like a great approach!

Trizek-WMF changed the task status from In Progress to Stalled.Jan 17 2023, 5:36 PM
Trizek-WMF changed the status of subtask T304953: Schedule the deployment of "Add a link" to more wikis from In Progress to Stalled.

We moved to a staggered deployment process. When all wikis will have trained models, then we will resume deployments.

When all wikis will have trained models, then we will resume deployments.

@kevinbazira
Do we have a rough ETA for when model training will be done for all Wikipedias? Thanks!

@KStoller-WMF, we are currently working on the 9th out of 18 rounds of wikis. Each of the 9 remaining rounds has ~20 models. ETA to train, evaluate, and publish all these models is about a month or more depending on the size of each of these wikis and whether or not we have to fine-tune the link recommendation algorithm to support a wiki's language-specific characters.

Will be sharing progress updates on the sub-tasks.

Sgs changed the status of subtask T308133: Deploy "add a link" to 8th round of wikis from Open to In Progress.
Sgs changed the status of subtask T308134: Deploy "add a link" to 9th round of wikis from Open to In Progress.
Trizek-WMF changed the task status from Stalled to In Progress.Mar 14 2023, 1:15 PM
Trizek-WMF added a project: Epic.

@Trizek-WMF Please further describe in this Phab ticket, what level of involvement is needed from CRS

Sgs changed the status of subtask T308141: Deploy "add a link" to 15th round of wikis from Open to In Progress.
KStoller-WMF renamed this task from Deploy "add a link" to all Wikipedias to [EPIC] Deploy "add a link" to all Wikipedias.Oct 13 2023, 4:59 AM