Page MenuHomePhabricator

Deploy "add a link" to 11th round of wikis
Closed, ResolvedPublic

Description

  • Training models
    • Latin Wikipedia la
    • Ladino Wikipedia lad
    • Luxembourgish Wikipedia lb
    • Laki Wikipedia lbe
    • Lezghian Wikipedia lez
    • Lingua Franca Nova Wikipedia lfn
    • Ganda Wikipedia lg
    • Limburgish Wikipedia li
    • Ligurian Wikipedia lij
    • Lombard Wikipedia lmo
    • Lingala Wikipedia ln
    • ~~ Lao Wikipedia lo~~
    • Northern Luri Wikipedia lrc see T308136#8648765
    • ~~ Lithuanian Wikipedia lt~~
    • Latgalian Wikipedia ltg
    • Latvian Wikipedia lv
    • Maithili Wikipedia mai
    • Basa Banyumasan Wikipedia map-bms
    • Moksha Wikipedia mdf
    • Malagasy Wikipedia mg
    • Armenian Wikipedia hy
    • Kyrgyz Wikipedia ky
  • Models verification
  • Publish Datasets
  • Populate the excluded section titles
  • Deploy back-end
  • Check how the model works on the wikis
  • In Search, use hasrecommendation:link to find articles
  • Test them on https://api.wikimedia.org/service/linkrecommendation/apidocs/#/default/get_v1_linkrecommendations__project___domain___page_title_
  • Inform communities
  • Deploy front-end

Event Timeline

21/22 models were trained successfully in the 11th round of wikis.

The Northern Luri Wikipedia (lrcwiki) pipeline did not complete successfully and is being investigated in T330616.

Model evaluation has been completed and below are the backtesting results:

Precision@0.5Recall@0.5
lawiki0.890.47
ladwiki0.830.67
lbwiki0.890.67
lbewiki0.950.42
lezwiki0.780.27
lfnwiki0.780.60
lgwiki0.920.51
liwiki0.850.55
lijwiki0.900.58
lmowiki0.930.69
lnwiki0.780.67
lowiki0.760.30
ltwiki0.790.47
ltgwiki0.830.55
lvwiki0.840.52
maiwiki0.910.24
map_bmswiki0.980.81
mdfwiki0.840.24
mgwiki0.970.76
hywiki0.740.19
kywiki0.810.52

CCing @MGerlach, in case he would like to add comments on the backtesting evaluation.

The conclusion on the backtesting results is that most of the languages look fine besides:

  • hywiki with both a precision (0.74) and recall (0.19) slightly lower than the recommended one (0.75 and 0.2 respectively).

Talked to @MGerlach about these results and agreed to deploy hywwiki since recently we published koiwiki that had a 0.13 recall and in the past we've published models with a precision above 0.70.

kevinbazira updated the task description. (Show Details)
kevinbazira added subscribers: kostajh, kevinbazira.

@kostajh, we published datasets for all 21/22 models that passed the evaluation in this round.

Sgs changed the task status from Open to In Progress.Jul 5 2023, 11:00 AM
Sgs triaged this task as Medium priority.

I ran this script for adding the link-recommendation task type and populating the excluded sections entries:

for WIKI in lawiki ladwiki lbwiki lbewiki lezwiki lfnwiki lgwiki liwiki lijwiki lmowiki lnwiki lowiki ltwiki ltgwiki lvwiki maiwiki map_bmswiki mdfwiki mgwiki hywiki kywiki; do
    ORIGIN=`mwscript getConfiguration.php $WIKI --settings 'wgCanonicalServer' --format json | jq --raw-output '.wgCanonicalServer'`
    mwscript extensions/GrowthExperiments/maintenance/changeWikiConfig.php $WIKI \
            --page MediaWiki:NewcomerTasks.json \
            --create-only \
            --json \
            --summary "Growth features configuration boilerplate ([[phab:$PHAB]])" \
            link-recommendation \
            '{ "type": "link-recommendation", "group": "easy" }'
    jq "select(.wiki==\"$WIKI\" and .probability > 0.25) | .section" wiki_sections.jsonl \
        | jq --slurp --compact-output "unique" \
        | mwscript extensions/GrowthExperiments/maintenance/changeWikiConfig.php $WIKI \
            --page MediaWiki:NewcomerTasks.json \
            --json \
            --summary "machine-generated configuration for excluding sections from link recommendations ([[phab:$PHAB]]), feel free to improve" \
            link-recommendation.excludedSections \
            "`cat`"
    echo "$ORIGIN/wiki/MediaWiki:NewcomerTasks.json"
    echo "$ORIGIN/w/index.php?title=MediaWiki:NewcomerTasks.json&diff=next"
    echo "Press <Enter> to continue"
    read # give time for manual verification
done

Change 935723 had a related patch set uploaded (by Sergio Gimeno; author: Sergio Gimeno):

[operations/mediawiki-config@master] GrowthExperiments: Enable backend of link recommendation 10, 11th round wikis

https://gerrit.wikimedia.org/r/935723

Change 935723 merged by jenkins-bot:

[operations/mediawiki-config@master] GrowthExperiments: Enable backend of link recommendation 10, 11, 12th round wikis

https://gerrit.wikimedia.org/r/935723

Mentioned in SAL (#wikimedia-operations) [2023-07-11T13:03:28Z] <urbanecm@deploy1002> Started scap: Backport for [[gerrit:935723|GrowthExperiments: Enable backend of link recommendation 10, 11, 12th round wikis (T308135 T308136 T308137)]]

Mentioned in SAL (#wikimedia-operations) [2023-07-11T13:04:58Z] <urbanecm@deploy1002> sgimeno and urbanecm: Backport for [[gerrit:935723|GrowthExperiments: Enable backend of link recommendation 10, 11, 12th round wikis (T308135 T308136 T308137)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet

Mentioned in SAL (#wikimedia-operations) [2023-07-11T13:13:13Z] <urbanecm@deploy1002> Finished scap: Backport for [[gerrit:935723|GrowthExperiments: Enable backend of link recommendation 10, 11, 12th round wikis (T308135 T308136 T308137)]] (duration: 09m 45s)

Status update, as per today the maintenance script is processing lbwiki. Growth engineers are trying to come up with an agreement on which actions should be taken to speed up this process.

Status update, as per today all wikis have produced results except for lowiki and ltwiki. I'm running the recommendations script manually for these two, so it's pretty safe to assume we will be able to enable the feature in all wikis from this round next Wednesday. With @Trizek-WMF compliance of course.

For context:

  • lowiki and mgwiki were not present in the wikis.txt file so I added them manually cc @kevinbazira
  • ltwiki the configuration was presumably created on the 5th of July, see T308136#8990342; it seems the image recommendation config change from the 10th of July undid the prior change without leaving trace (see history), is that even possible? cc @Urbanecm_WMF

@Sgs, I have the same results:

  • lo.wp: There were no results matching the query.
  • lt.wp: There were no results matching the query.

We can schedule all other wikis for deployment on Wednesday August 16.

Trizek-WMF set Due Date to Aug 16 2023, 4:00 PM.
  • ltwiki the configuration was presumably created on the 5th of July, see T308136#8990342; it seems the image recommendation config change from the 10th of July undid the prior change

Thanks for the ping Sergio. This happened, because lt.wikipedia's administrator deleted the GrowthExperiments configuration files on the days of May 4 and July 10, with the reason of nepasiteisinęs iš aukščiau nuleistas eksperimentas (according to Google Translate, that means failed experiment dropped from above; sounds kind of like "you're forcing an experimental feature on our site"?). The admin also protected the page against creation, with the reason of Pakartotinis ištrinto puslapio atkūrinėjimas (Google Translate: Recovering a deleted page).

The admin is very likely confused by the page's content (JSON probably looks like a set of semi-random strings, definitely doesn't look like an article). They probably consider the MediaWiki default system account as a vandal, which repetately recreates a page the admin(s) decided to delete. This is especially supported by their decision to protect the page from creation if you're not an admin (which doesn't do anything, as system accounts aren't subject to regular permission checks, but still).

I'll start a thread about this in Slack; it feels like a serious issue (they deleted the page twice, we (re)created it three times, once despite an active protection) for the community relationship. They're probably also confused by the page getting recreated despite the protection.

In any case: Since we have an administrator actively deleting our configuration files, we definitely shouldn't deploy the feature there. @Trizek-WMF, would it be possible to get in touch with the admin/community and learn why they're deleting the config files?

without leaving trace (see history), is that even possible? cc @Urbanecm_WMF

It did leave trace, but not in history. In the history, you can click "View logs for this page" to see whether the page was deleted in the past (you can also see who deleted it and why). Changing a page's content without any trace at all is not possible (unless one tries hard to do that using direct DB access or similar).

@Sgs, I have the same results:

  • lo.wp: There were no results matching the query.

I've checked the logs of the maintenance script for this wiki, and I see the following for lowiki:

Aug  9 13:58:07 mwmaint1002 mediawiki_job_growthexperiments-refreshLinkRecommendations-s3[15337]: lowiki:      checking candidate Etienne_de_La_Boétie... There was a problem during the HTTP request: 400 Bad Request
Aug  9 13:58:07 mwmaint1002 mediawiki_job_growthexperiments-refreshLinkRecommendations-s3[15337]: lowiki:      checking candidate ລິຊາ_(ນັກດົນຕີຊາວຍີ່ປຸ່ນເກີດ_ຄ.ສ._໑໙໘໗)... There was a problem during the HTTP request: 400 Bad Request
Aug  9 13:58:07 mwmaint1002 mediawiki_job_growthexperiments-refreshLinkRecommendations-s3[15337]: lowiki:      checking candidate ພະມະຫາກະສັດໄທ... There was a problem during the HTTP request: 400 Bad Request
Aug  9 13:58:07 mwmaint1002 mediawiki_job_growthexperiments-refreshLinkRecommendations-s3[15337]: lowiki:      checking candidate ເຂົາຊາຍ_ແກແລັກຊີ້... There was a problem during the HTTP request: 400 Bad Request
Aug  9 13:58:07 mwmaint1002 mediawiki_job_growthexperiments-refreshLinkRecommendations-s3[15337]: lowiki:      checking candidate 26_ຕຸລາ... There was a problem during the HTTP request: 400 Bad Request
Aug  9 13:58:07 mwmaint1002 mediawiki_job_growthexperiments-refreshLinkRecommendations-s3[15337]: lowiki:      checking candidate 30_ກັນຍາ... There was a problem during the HTTP request: 400 Bad Request
Aug  9 13:58:07 mwmaint1002 mediawiki_job_growthexperiments-refreshLinkRecommendations-s3[15337]: lowiki:      checking candidate ພະເຈົ້າຮົງລີທີ_8_ແຫ່ງອັງກິດ... There was a problem during the HTTP request: 400 Bad Request
Aug  9 13:58:07 mwmaint1002 mediawiki_job_growthexperiments-refreshLinkRecommendations-s3[15337]: lowiki:      checking candidate ປາເມລ້າ_ປາສິເນຕຕີ້... There was a problem during the HTTP request: 400 Bad Request
Aug  9 13:58:07 mwmaint1002 mediawiki_job_growthexperiments-refreshLinkRecommendations-s3[15337]: lowiki:      checking candidate 7_ມັງກອນ... There was a problem during the HTTP request: 400 Bad Request
Aug  9 13:58:07 mwmaint1002 mediawiki_job_growthexperiments-refreshLinkRecommendations-s3[15337]: lowiki:      checking candidate ສະໝັກ_ສຸນທໍຣະເວດ... There was a problem during the HTTP request: 400 Bad Request
Aug  9 13:58:07 mwmaint1002 mediawiki_job_growthexperiments-refreshLinkRecommendations-s3[15337]: lowiki:      checking candidate 20_ມັງກອນ... There was a problem during the HTTP request: 400 Bad Request
Aug  9 13:58:07 mwmaint1002 mediawiki_job_growthexperiments-refreshLinkRecommendations-s3[15337]: lowiki:      checking candidate ຊັດຊາດ_ສິດທິພັນ... There was a problem during the HTTP request: 400 Bad Request

Did we have some issues with the service? In any case, I cannot reproduce this when running the script manually.

  • lt.wp: There were no results matching the query.

This might be happening, because ltwiki admins deleted the config file (and thus the script doesn't see link-recommendation as an existing task type), unless someone ran the refreshing script while the on-wiki config file still existing.

Thanks for clarifying what happened in ltwiki, I suspected MW logs everything but had never encountered this case yet.

I've checked the logs of the maintenance script for this wiki, and I see the following for lowiki:

Yes, I've run it on-demand some hours ago from the maintenance server.

Did we have some issues with the service? In any case, I cannot reproduce this when running the script manually.

Yes, the first script run was showing 400 http errors when requesting the addlink service. I assumed temporary glitch and that's why I tried to re-run myself.

This might be happening, because ltwiki admins deleted the config file (and thus the script doesn't see link-recommendation as an existing task type), unless someone ran the refreshing script while the on-wiki config file still existing.

I've re-updated the config and re-run the script manually some hours ago. If we decide to not go with Lithuanian wiki we need to sunset the currently enabled cronjob. Waiting for @Trizek-WMF input.

@Sgs, I have the same results:

  • lo.wp: There were no results matching the query.
  • lt.wp: There were no results matching the query.

We can schedule all other wikis for deployment on Wednesday August 16.

Ack. lowiki has results now so we'll proceed with the enabling in all wikis except for ltwiki, see T344013: Growth's config files were deleted on lt.wikipedia.

Change 948094 had a related patch set uploaded (by Sergio Gimeno; author: Sergio Gimeno):

[operations/mediawiki-config@master] GrowthExperiments: enable add a link in 11th round of wikis

https://gerrit.wikimedia.org/r/948094

Change 948094 merged by jenkins-bot:

[operations/mediawiki-config@master] GrowthExperiments: enable add a link in 11th round of wikis

https://gerrit.wikimedia.org/r/948094

Mentioned in SAL (#wikimedia-operations) [2023-08-16T13:18:16Z] <urbanecm@deploy1002> Started scap: Backport for [[gerrit:948094|GrowthExperiments: enable add a link in 11th round of wikis (T308136)]]

Mentioned in SAL (#wikimedia-operations) [2023-08-16T13:19:54Z] <urbanecm@deploy1002> sgimeno and urbanecm: Backport for [[gerrit:948094|GrowthExperiments: enable add a link in 11th round of wikis (T308136)]] synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)

Mentioned in SAL (#wikimedia-operations) [2023-08-16T13:29:48Z] <urbanecm@deploy1002> Finished scap: Backport for [[gerrit:948094|GrowthExperiments: enable add a link in 11th round of wikis (T308136)]] (duration: 11m 32s)

Etonkovidova subscribed.

@Sgs, I have the same results:

  • lo.wp: There were no results matching the query.

Re-checked - the search lo.wp hasrecommendation:link now returns results.

  • lt.wp: There were no results matching the query.

Re-checked - lt.wp hasrecommendation:link now returns results.

Also checked lo.wp wmfand lt.wp with https://api.wikimedia.org/service/linkrecommendation/apidocs/#/default/get_v1_linkrecommendations__project___domain___page_title_ - the add link recommendations were found.

lt.wp doesn't have Growth features enabled according to https://phabricator.wikimedia.org/T308136#9086341.


Checked selectively wikis from the task description list - add link suggestions work as expected.