Page MenuHomePhabricator

Deploy "add a link" to 5th round of wikis
Closed, ResolvedPublic

Description

  • Training models
    • Assamese Wikipedia aswiki (remainder from the previous list)
    • Bashkir Wikipedia ba
    • Balinese Wikipedia ban
    • Bavarian Wikipedia bar
    • Samogitian Wikipedia bat-smg
    • Bikol Central Wikipedia bcl
    • Belarusian Wikipedia be
    • Belarusian (Taraškievica) Wikipedia be-x-old
    • Bulgarian Wikipedia bg
    • Bhojpuri Wikipedia bh
    • Bislama Wikipedia bi
    • Banjar Wikipedia bjn
    • Bambara Wikipedia bm
    • Tibetan Wikipedia bo see T304549#8060880
    • Bishnupriya Wikipedia bpy
    • Breton Wikipedia br
    • Bosnian Wikipedia bs
    • Buginese Wikipedia bug
    • Buryat Wikipedia bxr
    • Indonesian Wikipedia id
  • Models verification
  • Publish Datasets
  • Populate the excluded section titles
  • Deploy back-end
  • Check how the model works on the wikis
  • In Search, use hasrecommendation:link to find articles
  • Test them on https://api.wikimedia.org/service/linkrecommendation/apidocs/#/default/get_v1_linkrecommendations__project___domain___page_title_
  • Fix missing wikis and check again.
  • Inform communities (in Tech News 2022/42)
  • Deploy front-end (week 42)

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

The ZeroDivisionError issue was fixed and training models for the 5th round of wikis has been completed successfully.

We have also worked on models verification using the backtesting results shown below:

Precision@0.5Recall@0.5
bawiki0.810.31
banwiki0.950.80
barwiki0.880.61
bat_smgwiki0.800.36
bclwiki0.860.54
bewiki0.830.49
be_x_oldwiki0.860.63
bgwiki0.810.54
bhwiki0.940.72
biwiki0.960.87
bjnwiki0.890.58
bmwiki0.900.64
bowiki0.000.00
bpywiki0.990.83
brwiki0.850.54
bswiki0.870.64
bugwiki1.000.90
bxrwiki0.720.32
idwiki0.840.55

CCing @MGerlach, in case he would like to add comments on the backtesting evaluation.

The conclusion on the backtesting results is that most of the languages look fine but there are some redflags:

  • bowiki has an extremely low recall (0.00) and very low precision at 0.00

Talked to @MGerlach about bowiki and he said:

I believe this happens because the train and test data in bowiki is very small (27 sentences each only). While wikistats says there are ~12k articles in bowiki, most of them seem to be without any links which is actually quite interesting (here you can check random articles in bowiki). We discard articles without links for training or testing because we need actual links to train and test.

As recommended in T304548#7937512, it's best not to proceed with bowiki until there is improved performance.

kevinbazira added a subscriber: kostajh.

@kostajh, we completed training models for the fifth round of wikis (listed in the task description) and shared the models' evaluation above that suggested we exclude bowiki for the time being. We are now ready to publish the datasets for the wikis that passed the model evaluation, should we proceed?

@kostajh, we completed training models for the fifth round of wikis (listed in the task description) and shared the models' evaluation above that suggested we exclude bowiki for the time being. We are now ready to publish the datasets for the wikis that passed the model evaluation, should we proceed?

@kevinbazira yes please go ahead with publishing the datasets. Thanks!

kostajh updated the task description. (Show Details)

@kostajh, thank you for the confirmation. We have published the datasets for the 18/19 wikis that passed the evaluation.

kevinbazira updated the task description. (Show Details)
kevinbazira added a subscriber: kevinbazira.

Let's wait for September for this new deployment.

Let's wait for September for this new deployment.

Our team has an offsite next week, so we'll plan to do this the week after (October 10).

Trizek-WMF set Due Date to Oct 16 2022, 10:00 PM.Sep 28 2022, 6:15 PM

@KStoller-WMF and I discussed about the next steps. The backend deployment is scheduled to happen on week 42 (starting October 17). I won't be available to do the testing on how the model works before.

Change 843487 had a related patch set uploaded (by Kosta Harlan; author: Kosta Harlan):

[operations/mediawiki-config@master] GrowthExperiments: Enable link recommendation for 5th round wikis

https://gerrit.wikimedia.org/r/843487

I ran this script for adding the link-recommendation task type and populating the excluded sections entries:

for WIKI in aswiki bawiki banwiki barwiki bat-smgwiki bclwiki bewiki be-x-oldwiki bgwiki bhwiki biwiki bjnwiki bmwiki bpywiki brwiki bswiki bugwiki bxrwiki idwiki; do
    ORIGIN=`mwscript getConfiguration.php $WIKI --settings 'wgCanonicalServer' --format json | jq --raw-output '.wgCanonicalServer'`
    mwscript extensions/GrowthExperiments/maintenance/changeWikiConfig.php $WIKI \
            --page MediaWiki:NewcomerTasks.json \
            --create-only \
            --json \
            --summary "Growth features configuration boilerplate ([[phab:$PHAB]])" \
            link-recommendation \
            '{ "type": "link-recommendation", "group": "easy" }'
    jq "select(.wiki==\"$WIKI\" and .probability > 0.25) | .section" wiki_sections.jsonl \
        | jq --slurp --compact-output "unique" \
        | mwscript extensions/GrowthExperiments/maintenance/changeWikiConfig.php $WIKI \
            --page MediaWiki:NewcomerTasks.json \
            --json \
            --summary "machine-generated configuration for excluding sections from link recommendations ([[phab:$PHAB]]), feel free to improve" \
            link-recommendation.excludedSections \
            "`cat`"
    echo "$ORIGIN/wiki/MediaWiki:NewcomerTasks.json"
    echo "$ORIGIN/w/index.php?title=MediaWiki:NewcomerTasks.json&diff=next"
    read # give time for manual verification
done

This failed for be-x-oldwiki and bat-smgwiki with error messages like:

Fatal error: no version entry for `bat-smgwiki`.
 in /srv/mediawiki/multiversion/MWMultiVersion.php on line 534
parse error: Invalid literal at line 1, column 3
no version entry for `bat-smgwiki`.

So, for bat-smgwiki and be-x-oldwiki I manually updated MediaWiki:NewcomerTasks.json.

However, I am still going to hold those wikis back from this deployment because:

  • the datasets were uploaded as bat_smg but the wiki ID is bat-smg, so lookups from MediaWiki to the link recommendation service will fail
  • be-x-oldwiki redirects to be-tarask, so I think we should use that for the name of the dataset, @Trizek-WMF does that sound right to you?

I ran this script for adding the link-recommendation task type and populating the excluded sections entries:

for WIKI in aswiki bawiki banwiki barwiki bat-smgwiki bclwiki bewiki be-x-oldwiki bgwiki bhwiki biwiki bjnwiki bmwiki bpywiki brwiki bswiki bugwiki bxrwiki idwiki; do
    ORIGIN=`mwscript getConfiguration.php $WIKI --settings 'wgCanonicalServer' --format json | jq --raw-output '.wgCanonicalServer'`
    mwscript extensions/GrowthExperiments/maintenance/changeWikiConfig.php $WIKI \
            --page MediaWiki:NewcomerTasks.json \
            --create-only \
            --json \
            --summary "Growth features configuration boilerplate ([[phab:$PHAB]])" \
            link-recommendation \
            '{ "type": "link-recommendation", "group": "easy" }'
    jq "select(.wiki==\"$WIKI\" and .probability > 0.25) | .section" wiki_sections.jsonl \
        | jq --slurp --compact-output "unique" \
        | mwscript extensions/GrowthExperiments/maintenance/changeWikiConfig.php $WIKI \
            --page MediaWiki:NewcomerTasks.json \
            --json \
            --summary "machine-generated configuration for excluding sections from link recommendations ([[phab:$PHAB]]), feel free to improve" \
            link-recommendation.excludedSections \
            "`cat`"
    echo "$ORIGIN/wiki/MediaWiki:NewcomerTasks.json"
    echo "$ORIGIN/w/index.php?title=MediaWiki:NewcomerTasks.json&diff=next"
    read # give time for manual verification
done

This failed for be-x-oldwiki and bat-smgwiki with error messages like:

Fatal error: no version entry for `bat-smgwiki`.
 in /srv/mediawiki/multiversion/MWMultiVersion.php on line 534
parse error: Invalid literal at line 1, column 3
no version entry for `bat-smgwiki`.

So, for bat-smgwiki and be-x-oldwiki I manually updated MediaWiki:NewcomerTasks.json.

However, I am still going to hold those wikis back from this deployment because:

  • the datasets were uploaded as bat_smg but the wiki ID is bat-smg, so lookups from MediaWiki to the link recommendation service will fail
  • be-x-oldwiki redirects to be-tarask, so I think we should use that for the name of the dataset, @Trizek-WMF does that sound right to you?

Hmm, actually it looks like the wiki IDs are indeed hardcoded as bat_smgwiki and be_x_oldwiki, so that should work when calling the link recommendation service. We'll find out soon :)

Change 843487 merged by jenkins-bot:

[operations/mediawiki-config@master] GrowthExperiments: Enable link recommendation for 5th round wikis

https://gerrit.wikimedia.org/r/843487

I ran this script for adding the link-recommendation task type and populating the excluded sections entries:

for WIKI in aswiki bawiki banwiki barwiki bat-smgwiki bclwiki bewiki be-x-oldwiki bgwiki bhwiki biwiki bjnwiki bmwiki bpywiki brwiki bswiki bugwiki bxrwiki idwiki; do
    ORIGIN=`mwscript getConfiguration.php $WIKI --settings 'wgCanonicalServer' --format json | jq --raw-output '.wgCanonicalServer'`
    mwscript extensions/GrowthExperiments/maintenance/changeWikiConfig.php $WIKI \
            --page MediaWiki:NewcomerTasks.json \
            --create-only \
            --json \
            --summary "Growth features configuration boilerplate ([[phab:$PHAB]])" \
            link-recommendation \
            '{ "type": "link-recommendation", "group": "easy" }'
    jq "select(.wiki==\"$WIKI\" and .probability > 0.25) | .section" wiki_sections.jsonl \
        | jq --slurp --compact-output "unique" \
        | mwscript extensions/GrowthExperiments/maintenance/changeWikiConfig.php $WIKI \
            --page MediaWiki:NewcomerTasks.json \
            --json \
            --summary "machine-generated configuration for excluding sections from link recommendations ([[phab:$PHAB]]), feel free to improve" \
            link-recommendation.excludedSections \
            "`cat`"
    echo "$ORIGIN/wiki/MediaWiki:NewcomerTasks.json"
    echo "$ORIGIN/w/index.php?title=MediaWiki:NewcomerTasks.json&diff=next"
    read # give time for manual verification
done

This failed for be-x-oldwiki and bat-smgwiki with error messages like:

Fatal error: no version entry for `bat-smgwiki`.
 in /srv/mediawiki/multiversion/MWMultiVersion.php on line 534
parse error: Invalid literal at line 1, column 3
no version entry for `bat-smgwiki`.

So, for bat-smgwiki and be-x-oldwiki I manually updated MediaWiki:NewcomerTasks.json.

However, I am still going to hold those wikis back from this deployment because:

  • the datasets were uploaded as bat_smg but the wiki ID is bat-smg, so lookups from MediaWiki to the link recommendation service will fail
  • be-x-oldwiki redirects to be-tarask, so I think we should use that for the name of the dataset, @Trizek-WMF does that sound right to you?

Hmm, actually it looks like the wiki IDs are indeed hardcoded as bat_smgwiki and be_x_oldwiki, so that should work when calling the link recommendation service. We'll find out soon :)

Once we resolve T320961: LinkRecommendation: Don't convert underscores to dashes, we can move forward with bat_smg and be_x_old wikis.

Once we resolve T320961: LinkRecommendation: Don't convert underscores to dashes, we can move forward with bat_smg and be_x_old wikis.

If needed, we can move them to the next round (we did it last time for aswiki), and only deploy to the wikis where the deployment worked.

Change 843937 had a related patch set uploaded (by Kosta Harlan; author: Kosta Harlan):

[operations/mediawiki-config@master] GrowthExperiments: Enable AddLink backend for bat_smg and be_x_old

https://gerrit.wikimedia.org/r/843937

Change 843937 merged by jenkins-bot:

[operations/mediawiki-config@master] GrowthExperiments: Enable AddLink backend for bat_smg

https://gerrit.wikimedia.org/r/843937

Mentioned in SAL (#wikimedia-operations) [2022-10-18T13:30:10Z] <kharlan@deploy1002> Started scap: Backport for [[gerrit:843937|GrowthExperiments: Enable AddLink backend for bat_smg (T304549)]]

Mentioned in SAL (#wikimedia-operations) [2022-10-18T13:30:34Z] <kharlan@deploy1002> kharlan and kharlan: Backport for [[gerrit:843937|GrowthExperiments: Enable AddLink backend for bat_smg (T304549)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet

Mentioned in SAL (#wikimedia-operations) [2022-10-18T13:35:01Z] <kharlan@deploy1002> Finished scap: Backport for [[gerrit:843937|GrowthExperiments: Enable AddLink backend for bat_smg (T304549)]] (duration: 04m 50s)

Change 843950 had a related patch set uploaded (by Kosta Harlan; author: Kosta Harlan):

[operations/mediawiki-config@master] GrowthExperiments: Enable AddLink backend for be_x_oldwiki

https://gerrit.wikimedia.org/r/843950

Change 843950 merged by jenkins-bot:

[operations/mediawiki-config@master] GrowthExperiments: Enable AddLink backend for be_x_oldwiki

https://gerrit.wikimedia.org/r/843950

Mentioned in SAL (#wikimedia-operations) [2022-10-18T13:48:34Z] <kharlan@deploy1002> Started scap: Backport for [[gerrit:843950|GrowthExperiments: Enable AddLink backend for be_x_oldwiki (T304549)]]

Mentioned in SAL (#wikimedia-operations) [2022-10-18T13:48:57Z] <kharlan@deploy1002> kharlan and kharlan: Backport for [[gerrit:843950|GrowthExperiments: Enable AddLink backend for be_x_oldwiki (T304549)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet

Mentioned in SAL (#wikimedia-operations) [2022-10-18T13:53:30Z] <kharlan@deploy1002> Finished scap: Backport for [[gerrit:843950|GrowthExperiments: Enable AddLink backend for be_x_oldwiki (T304549)]] (duration: 04m 56s)

Claiming for models tests (tomorrow).

I tested the API following the instructions in the task description.

Results:

All other wikis return links suggestions.

Reassigning to @kostajh for investigation and fixes.

be-x-old: "Unable to process request for wikipedia/be-x-old" when testing the API. Test failed with be-tarask as well.

@Trizek-WMF you have to use be_x_old for the "language" language field.

I'll have a look at the other ones.

be-x-old: "Unable to process request for wikipedia/be-x-old" when testing the API. Test failed with be-tarask as well.

@Trizek-WMF you have to use be_x_old for the "language" language field.

I'll have a look at the other ones.

Thank you. I checked the API, and it returns suggested edits.

I tested the API following the instructions in the task description.

Results:

No results today either.

I see 818 results today.

  • be-x-old: "Unable to process request for wikipedia/be-x-old" when testing the API. Test failed with be-tarask as well.

6,322 results in search index.

6 results today. Might need more time for indexing.

6,710 results today.

9,100 results today.

  • bug: no results (which is kind of funny given the wiki's code)

3 results today.

440 results today.

All other wikis return links suggestions.

Reassigning to @kostajh for investigation and fixes.

I'd suggest we hold back aswiki, bugwiki, bpywiki for the 6th round. It's possible we need more time for refreshLinkRecommendations.php to run. Given we have a lot going on, just waiting to see if the script eventually generates enough recommendations seems like the easiest thing to do.

Change 849092 had a related patch set uploaded (by Kosta Harlan; author: Kosta Harlan):

[operations/mediawiki-config@master] GrowthExperiments: Enable link recommendation for aswiki

https://gerrit.wikimedia.org/r/849092

Change 849092 merged by jenkins-bot:

[operations/mediawiki-config@master] GrowthExperiments: Enable link recommendation for aswiki

https://gerrit.wikimedia.org/r/849092

Change 849092 merged by jenkins-bot:

[operations/mediawiki-config@master] GrowthExperiments: Enable link recommendation for aswiki

https://gerrit.wikimedia.org/r/849092

Update, aswiki had no recommendations because I missed including it in the config patch. I'm enabling that one now.

@Trizek-WMF seems like many of the articles on bugwiki are very short (e.g. compare https://bug.wikipedia.org/wiki/Linux to https://en.wikipedia.org/wiki/Linux). We can't generate link recommendations if there's little text in the source article, or other articles with relevant content to link to. bpywiki looks like a similar scenario.

Thank you @kostajh.

Let's put on hold wikis with too few articles, or no suggestions: aswiki, bugwiki, bpywiki.

For as, can we expect to have some results soon?

Thank you @kostajh.

Let's put on hold wikis with too few articles, or no suggestions: aswiki, bugwiki, bpywiki.

For as, can we expect to have some results soon?

There are 130 results for aswiki now.

@Trizek-WMF do you want me to enable these wikis (excluding bugwiki and bpywiki) later today, at 13:00 UTC? Or do you prefer to do it on Thursday?

@kostajh, yet, let's turn it on for all wikis, except bugwiki and bpywiki.

Change 849546 had a related patch set uploaded (by Kosta Harlan; author: Kosta Harlan):

[operations/mediawiki-config@master] GrowthExperiments: Enable link recommendation frontend for 5th round

https://gerrit.wikimedia.org/r/849546

Change 849546 merged by jenkins-bot:

[operations/mediawiki-config@master] GrowthExperiments: Enable link recommendation frontend for 5th round

https://gerrit.wikimedia.org/r/849546

Mentioned in SAL (#wikimedia-operations) [2022-10-26T13:24:45Z] <urbanecm@deploy1002> Started scap: Backport for [[gerrit:849546|GrowthExperiments: Enable link recommendation frontend for 5th round (T304549)]]

Mentioned in SAL (#wikimedia-operations) [2022-10-26T13:25:09Z] <urbanecm@deploy1002> urbanecm and kharlan: Backport for [[gerrit:849546|GrowthExperiments: Enable link recommendation frontend for 5th round (T304549)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet

Mentioned in SAL (#wikimedia-operations) [2022-10-26T13:30:38Z] <urbanecm@deploy1002> Finished scap: Backport for [[gerrit:849546|GrowthExperiments: Enable link recommendation frontend for 5th round (T304549)]] (duration: 05m 52s)

kostajh moved this task from In Progress to QA on the Growth-Team (Current Sprint) board.

@kostajh, yet, let's turn it on for all wikis, except bugwiki and bpywiki.

Done!

Trizek-WMF updated the task description. (Show Details)

I let QA close the task when done.

We could also just lower the score threshold or minimum link count for wikis which get very few suggestions.

Etonkovidova claimed this task.
Etonkovidova added a subscriber: Etonkovidova.

Teseted addlink UI for bswiki, brwiki, idwiki, bclwiki, and bhwiki - all works as expected. The add link edits are present on Recentchanges page only for idwiki so far.

Special:NewcomerTasksInfo lists add link tasks for all wikis in 5th round.