Page MenuHomePhabricator

Deploy "add a link" to 4th round of wikis
Closed, ResolvedPublic

Description

  • Training models
    • Abkhazian Wikipedia abwiki
    • Achinese Wikipedia acewiki
    • Adyghe Wikipedia adywiki
    • Afrikaans Wikipedia afwiki
    • Akan Wikipedia akwiki
    • Alemannisch Wikipedia alswiki
    • Amharic Wikipedia amwiki
    • Aragonese Wikipedia anwiki
    • Old English Wikipedia angwiki
    • Aramaic Wikipedia arcwiki
    • Egyptian Arabic Wikipedia arzwiki
    • Assamese Wikipedia aswiki
    • Asturian Wikipedia astwiki
    • Atikamekw Wikipedia atjwiki
    • Avaric Wikipedia avwiki
    • Aymara Wikipedia aywiki
    • Azerbaijani Wikipedia azwiki
    • South Azerbaijani Wikipedia azbwiki
    • Japanese Wikipedia jawiki
  • Models verification
  • Publish Datasets
  • Populate the excluded section titles
  • Deploy back-end (except ja and as)
  • Check how the model works on the wikis
  • In Special:Search, use hasrecommendation:link to find articles
  • Test them on https://api.wikimedia.org/service/linkrecommendation/apidocs/#/default/get_v1_linkrecommendations__project___domain___page_title_
  • In Special:NewcomerTasksInfo, link-recommendation gives the number of articles
  • Inform communities (ja and as are excluded)
  • Deploy front-end

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Generating datasets and training models for the first 18 wikis in this round went well. When I reached the Japanese Wiki, I kept running into the error shown in the screenshot below. To keep archives happy, this issue was solved by installing both mecab-python3 and unidic-lite in a venv.

Japanese Wiki Add a Link error - Screenshot from 2022-05-16 16-22-37.png (672×1 px, 206 KB)

Training models for the 4th round of wikis has been completed successfully.

We have also worked on models verification using the backtesting results shown below:

Precision@0.5Recall@0.5
abwiki0.920.60
acewiki0.950.70
adywiki0.900.56
afwiki0.870.58
akwiki0.860.58
alswiki0.860.68
amwiki0.710.31
anwiki0.870.54
angwiki0.810.41
arcwiki0.740.31
arzwiki0.970.82
aswiki0.570.16
astwiki0.860.57
atjwiki0.860.74
avwiki0.850.40
aywiki0.970.41
azwiki0.680.21
azbwiki0.960.79
jawiki0.320.01

CCing @MGerlach, in case he'd like to add comments on the backtesting evaluation.

@kevinbazira great.
Most of the languages look fine but there are some redflags:

  • jawiki with extremely low recall (0.01) and very low precision at 0.32
  • aswiki with much lower precision than other wikis (0.57)

I assume that some of our heuristics are failing when working with these scripts. I dont have experience with these languages, so it is hard to pin down exactly, but I would speculate it is related to the tokenization when we generate anchor-candidates from substrings of the raw text. We would have to dig deeper here to understand what is the problem and how it could be fixed.
In the short run, I would recommend to not proceed with these languages.

kevinbazira added a subscriber: kostajh.

@kostajh, we completed training models for the fourth round of wikis (listed in the task description) and shared the models' evaluation above that suggested we exclude jawiki and aswiki for the time being. We are now ready to publish the datasets for the wikis that passed the model evaluation, should we proceed?

@kostajh, we completed training models for the fourth round of wikis (listed in the task description) and shared the models' evaluation above that suggested we exclude jawiki and aswiki for the time being. We are now ready to publish the datasets for the wikis that passed the model evaluation, should we proceed?

yes, please do. Thanks!

cc @Trizek-WMF about jawiki and aswiki.

We can exclude these wikis and proceed with the ones that are ready.
But what are the next steps for wikis we skip?

Since @MGerlach mentioned that some heuristics are failing when working with the skipped wikis, I think the next steps would be to manually inspect the models with users who have experience with these languages or use google-translate as the link-recommendation algorithm is iteratively improved until the models pass the backtesting evaluation.

Since @MGerlach mentioned that some heuristics are failing when working with the skipped wikis, I think the next steps would be to manually inspect the models with users who have experience with these languages or use google-translate as the link-recommendation algorithm is iteratively improved until the models pass the backtesting evaluation.

Some thoughts on potential starting points on how to manually inspect the model. In order to figure where the text processing is failing for these languages, we could start checking the following steps:

  • generation of candidate anchors (code). this uses simple tokenization to split the text into substrings which then serve as potential anchor-words which can be linked. if we cant identify suitable words, e.g. due to the absence of whitespaces, we wont be able to generate good link recommendations.
  • generation of link candidates (code). for each candidate anchor, we look up link candidates in the anchor dictionary. The anchor dictionary contains all the already existing links (anchor+title of the linked page) and is created in this script. we should make sure the anchor dictionary is populated with a sufficient number of links, otherwise we wont be able to generate link candidates for any anchor.
  • disambiguation of link candidates (code). once we have one or more link candidates for a candidate anchor, the model selects the most probable link from the candidates. we can inspect the probabilities assigned to each link candidate to understand where a potential error might come from.
kevinbazira updated the task description. (Show Details)

@kostajh, thank you for the confirmation. We have published the datasets for the 17/19 wikis that passed the evaluation.

I'm keeping the two wikis in my freezer.

@MGerlach, when could we start checking on the models?

calbon removed kevinbazira as the assignee of this task.
calbon moved this task from In Progress to Parked on the Machine-Learning-Team (Active Tasks) board.

I'm keeping the two wikis in my freezer.

@MGerlach, when could we start checking on the models?

Anytime. The models should be available for inspection locally on one of the stat-machines. @kevinbazira ran the training pipeline for these models and could probably share where the files for the models are located.
Maybe it would be worth to open a separate task for this work to be discussed in more detail? I think this could involve substantial amount of effort to find the error and try to fix it since we dont yet know what exactly we are looking for.

Mentioned in SAL (#wikimedia-operations) [2022-05-26T19:40:28Z] <tgr> T304548 running extensions/GrowthExperiments/maintenance/changeWikiConfig.php on tier4 Growth wikis

Populated section title config (and documented the process).
abwiki, adywiki, akwiki and atjwiki do not have other task types at all:

Screenshot Capture - 2022-05-26 - 21-42-20.png (730×1 px, 142 KB)

@Trizek-WMF is that normal?

Change 800247 had a related patch set uploaded (by Gergő Tisza; author: Gergő Tisza):

[operations/mediawiki-config@master] Enable GrowthExperiments link recommendations, round 4

https://gerrit.wikimedia.org/r/800247

Change 800247 merged by jenkins-bot:

[operations/mediawiki-config@master] Enable GrowthExperiments link recommendations, round 4

https://gerrit.wikimedia.org/r/800247

Mentioned in SAL (#wikimedia-operations) [2022-05-26T20:12:44Z] <brennen@deploy1002> Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:800247|Enable GrowthExperiments link recommendations, round 4 (T304548)]] (duration: 00m 56s)

Tgr removed Tgr as the assignee of this task.May 26 2022, 8:39 PM
Tgr updated the task description. (Show Details)
Tgr subscribed.

Populated section title config (and documented the process).
abwiki, adywiki, akwiki and atjwiki do not have other task types at all:

Screenshot Capture - 2022-05-26 - 21-42-20.png (730×1 px, 142 KB)

@Trizek-WMF is that normal?

Yes. When we ran the deployment by default of Growth features to all Wikipedias, we found out that some of the smaller ones don't have any recognizable maintenance templates, or no maintenance templates at all. "Add a link" is the solution we have for these wikis.

Trizek-WMF set Due Date to Jun 15 2022, 4:00 PM.

Wikis with no suggested edits (cc @kostajh):

  • ak
  • as

The other ones have suggested edits.

Wikis with no suggested edits (cc @kostajh):

  • ak
  • as

The other ones have suggested edits.

I see suggested edits on as.wikipedia.org:

image.png (1×1 px, 326 KB)

The only task type for akwiki is link-recommendation, which won't be visible until we enable the front-end.

@kostajh, I went to Special:Search, used hasrecommendation:link to find articles and had nothing there. Are we good?

@kostajh, I went to Special:Search, used hasrecommendation:link to find articles and had nothing there. Are we good?

@Trizek-WMF which wiki(s) are you referring to?

@kostajh, I used the process listed in the task description:

  1. In Special:Search, use hasrecommendation:link to find articles
  2. Test them on https://api.wikimedia.org/service/linkrecommendation/apidocs/#/default/get_v1_linkrecommendations__project___domain___page_title_
  3. In Special:NewcomerTasksInfo, link-recommendation gives the number of articles

When I checked, the deployment list (minus as and ja), the only two wikis without Links suggestions are ak and as.

@kostajh, I used the process listed in the task description:

  1. In Special:Search, use hasrecommendation:link to find articles
  2. Test them on https://api.wikimedia.org/service/linkrecommendation/apidocs/#/default/get_v1_linkrecommendations__project___domain___page_title_
  3. In Special:NewcomerTasksInfo, link-recommendation gives the number of articles

When I checked, the deployment list (minus as and ja), the only two wikis without Links suggestions are ak and as.

Thanks for that.

It looks like aswiki wasn't specified in the config patch https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/800247, so I suspect adding it there would fix the problem there

akwiki was in the deployment, so checking the logs ( grep akwiki /var/log/mediawiki/mediawiki_job_growthexperiments-refreshLinkRecommendations-s*/syslog.log), I see:

mediawiki_job_growthexperiments-refreshLinkRecommendations-s3/syslog.log:Jun 15 07:13:05 mwmaint1002 mediawiki_job_growthexperiments-refreshLinkRecommendations-s3[1057]: akwiki:      checking candidate Facebook... number of good links too small (1)
mediawiki_job_growthexperiments-refreshLinkRecommendations-s3/syslog.log:Jun 15 07:13:06 mwmaint1002 mediawiki_job_growthexperiments-refreshLinkRecommendations-s3[1057]: akwiki:      checking candidate YouTube... All of the links in the recommendation have been pruned
mediawiki_job_growthexperiments-refreshLinkRecommendations-s3/syslog.log:Jun 15 07:13:06 mwmaint1002 mediawiki_job_growthexperiments-refreshLinkRecommendations-s3[1057]: akwiki:      checking candidate Etuo... All of the links in the recommendation have been pruned
mediawiki_job_growthexperiments-refreshLinkRecommendations-s3/syslog.log:Jun 15 07:13:06 mwmaint1002 mediawiki_job_growthexperiments-refreshLinkRecommendations-s3[1057]: akwiki:      topic exhausted, 500 tasks still needed

Given that there are only 590 content pages, I suspect there is not enough content on this wiki to generate link recommendations. @Trizek-WMF should we remove it from the deployment list?

Thank you for checking, Kosta.

I moved akwiki to the next round, hopefully it won't be forgotten twice! ;)


Regarding wikis with a low number of articles, it is a problem. The majority of these wikis have no maintenance templates, hence no way to offer tasks to newcomers. When we deployed the Growth features there, our plan with Marshall was to have Add a link being deployed, and then to offer at least one task type to newcomers.

What should we do, @MMiller_WMF? give up on these wikis, or find a way to improve the model?

Change 805766 had a related patch set uploaded (by Kosta Harlan; author: Kosta Harlan):

[operations/mediawiki-config@master] GrowthExperiments: Enable link recommendation on aswiki

https://gerrit.wikimedia.org/r/805766

@Trizek-WMF should we proceed with enabling the front-end for the wikis today?

Thank you for checking, Kosta.

I moved akwiki to the next round, hopefully it won't be forgotten twice! ;)

To clarify: akwiki wasn't forgotten, it's just that our script does not generate any link recommendations on this wiki because there is not enough content (see T304548#8005057 for details).

Do you want to move aswiki to the next round? That one was indeed missed in this round.

@Trizek-WMF should we proceed with enabling the front-end for the wikis today?

Thank you for checking, Kosta.

I moved akwiki to the next round, hopefully it won't be forgotten twice! ;)

To clarify: akwiki wasn't forgotten, it's just that our script does not generate any link recommendations on this wiki because there is not enough content (see T304548#8005057 for details).

Do you want to move aswiki to the next round? That one was indeed missed in this round.

@Trizek-WMF @KStoller-WMF no deploys on Fridays, so the next opportunity is Monday. Please comment here if we should go ahead with enabling the frontend for add link for these wikis, with the exception of aswiki and akwiki which are in the next round.

Change 806365 had a related patch set uploaded (by Kosta Harlan; author: Kosta Harlan):

[operations/mediawiki-config@master] GrowthExperiments: Enable link recommendations frontend, round 4

https://gerrit.wikimedia.org/r/806365

@Trizek-WMF @KStoller-WMF no deploys on Fridays, so the next opportunity is Monday. Please comment here if we should go ahead with enabling the frontend for add link for these wikis, with the exception of aswiki and akwiki which are in the next round.

Let's go ahead on Monday.

If we can have as.wp in this round, then let's go. Otherwise, it is fine to have it with the next round.

@Trizek-WMF @KStoller-WMF no deploys on Fridays, so the next opportunity is Monday. Please comment here if we should go ahead with enabling the frontend for add link for these wikis, with the exception of aswiki and akwiki which are in the next round.

Let's go ahead on Monday.

Oops, there are no deploys on Monday due to the earth day holiday.

If we can have as.wp in this round, then let's go. Otherwise, it is fine to have it with the next round.

What we could do is: enable the backend for aswiki on Tuesday, and enable the frontend for all 4th round wikis on Thursday. That way aswiki has ~2 days for the task pool to fill up. Sounds OK?

Trizek-WMF changed Due Date from Jun 15 2022, 4:00 PM to Jun 23 2022, 4:00 PM.Jun 17 2022, 11:06 AM

Change 805766 merged by jenkins-bot:

[operations/mediawiki-config@master] GrowthExperiments: Enable link recommendation on aswiki

https://gerrit.wikimedia.org/r/805766

Change 806365 merged by jenkins-bot:

[operations/mediawiki-config@master] GrowthExperiments: Enable link recommendations frontend, round 4

https://gerrit.wikimedia.org/r/806365

Mentioned in SAL (#wikimedia-operations) [2022-06-23T07:25:21Z] <samtar@deploy1002> Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:806365|GrowthExperiments: Enable link recommendations frontend, round 4 (T304548)]] (duration: 03m 37s)

@Trizek-WMF @KStoller-WMF no deploys on Fridays, so the next opportunity is Monday. Please comment here if we should go ahead with enabling the frontend for add link for these wikis, with the exception of aswiki and akwiki which are in the next round.

Let's go ahead on Monday.

Oops, there are no deploys on Monday due to the earth day holiday.

If we can have as.wp in this round, then let's go. Otherwise, it is fine to have it with the next round.

What we could do is: enable the backend for aswiki on Tuesday, and enable the frontend for all 4th round wikis on Thursday. That way aswiki has ~2 days for the task pool to fill up. Sounds OK?

Sorry, I should have read farther back up the comment thread here.

So, we've deployed the fourth round without those two wikis.

kostajh triaged this task as High priority.Jun 24 2022, 9:56 AM
kostajh updated the task description. (Show Details)
kostajh moved this task from In Progress to QA on the Growth-Team (Sprint 0 (Growth Team)) board.

Thank you!

Waiting for QA to close this ticket.

Etonkovidova subscribed.

Checked some wikis from the list (both desktop and mobile) - link recommendation task works as expected; other functionality was also checked - no issues were found.

Notes

  • UI translation rate is really low