Page MenuHomePhabricator

Deploy "add a link" to 7th round of wikis
Closed, ResolvedPublic

Description

  • Training models
    • Darija Wikipedia ary
    • Danish Wikipedia da
    • Dinka Wikipedia din
    • [x] Zazaki Wikipedia diq
    • Lower Sorbian Wikipedia dsb
    • [x] Divehi Wikipedia dv
    • Dzongkha Wikipedia dz see T304551#8412493
    • Ewe Wikipedia ee
    • Greek Wikipedia el
    • Emiliano-Romagnolo Wikipedia eml
    • Esperanto Wikipedia eo
    • Estonian Wikipedia et
    • Basque Wikipedia eu
    • Extremaduran Wikipedia ext
    • Tumbuka Wikipedia tum
  • Models verification
  • Publish Datasets
  • Populate the excluded section titles
  • Deploy back-end
  • Check how the model works on the wikis
  • In Search, use hasrecommendation:link to find articles
  • Test them on https://api.wikimedia.org/service/linkrecommendation/apidocs/#/default/get_v1_linkrecommendations__project___domain___page_title_
  • Inform communities
  • Deploy front-end

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

I added Tumbuka Wikipedia (tum) as they are interested by the feature.

I added Darija Wikipedia (ary) that was skipped from my lists when I created the deployment rounds.

14/15 models were trained successfully in the 7th round of wikis.

The Dzongkha Wikipedia (dzwiki) returned the error in the screenshot below.

dzwiki training pipeline error - Screenshot from 2022-11-21 09-45-55.png (741×1 px, 190 KB)

I checked the database dumps for dzwiki and they exist.

Going to investigate what the problem could be.

I contacted @MGerlach on whether this error means that there is not enough data to train the model and he said:

Interesting. indeed, it seems that there is not enough data to train the model. The wiki has only around 500 articles (wikistats). looking at the wiki, it seems that most of the articles contain few or no links (I checked a few examples of random pages https://dz.wikipedia.org/wiki/Special:Random ). this means we dont actually have any training examples. as a result, it seems that the table with the features is empty.

For now, it's better to skip the dzwiki, we shall train its model in the future when there is enough training data.

Model evaluation has been completed and below are the backtesting results:

Precision@0.5Recall@0.5
arywiki0.790.44
dawiki0.790.48
dinwiki0.940.48
diqwiki0.400.90
dsbwiki0.890.66
dvwiki0.670.02
eewiki0.930.82
elwiki0.790.44
emlwiki0.890.57
eowiki0.820.51
etwiki0.770.33
euwiki0.860.37
extwiki0.750.50
tumwiki0.840.62

CCing @MGerlach, in case he would like to add comments on the backtesting evaluation.

The conclusion on the backtesting results is that most of the languages look fine but there are some redflags:

  • diqwiki has very low precision (0.40).
  • although dvwiki's precision (0.67) is not so far from the recommended one (0.75), it has an extremely low recall (0.02).

Talked to @MGerlach about diqwiki and dvwiki and he said:

I would agree with your observation and would recommend not deploy to diqwiki and dvwiki.

  • diqwiki: precision too low, i.e. recommendations are not good
  • dvwiki: recall extremely low. this indicates that we will likely not be able to generate many recommendations.

As recommended, it's best not to proceed with diqwiki and dvwiki until there is improved performance.

@kostajh, we published datasets for all 12/15 models in this round that passed the evaluation.

kevinbazira added a subscriber: kevinbazira.

I ran this script for adding the link-recommendation task type and and populating the excluded sections:

PHAB=T304551
for WIKI in arywiki dawiki dinwiki diqwiki dsbwiki dvwiki eewiki elwiki emlwiki eowiki etwiki euwiki extwiki tumwiki; do
    ORIGIN=`mwscript getConfiguration.php $WIKI --settings 'wgCanonicalServer' --format json | jq --raw-output '.wgCanonicalServer'`
    mwscript extensions/GrowthExperiments/maintenance/changeWikiConfig.php $WIKI \
            --page MediaWiki:NewcomerTasks.json \
            --create-only \
            --json \
            --summary "Growth features configuration boilerplate ([[phab:$PHAB]])" \
            link-recommendation \
            '{ "type": "link-recommendation", "group": "easy" }'
    jq "select(.wiki==\"$WIKI\" and .probability > 0.25) | .section" wiki_sections.jsonl \
        | jq --slurp --compact-output "unique" \
        | mwscript extensions/GrowthExperiments/maintenance/changeWikiConfig.php $WIKI \
            --page MediaWiki:NewcomerTasks.json \
            --json \
            --summary "machine-generated configuration for excluding sections from link recommendations ([[phab:$PHAB]]), feel free to improve" \
            link-recommendation.excludedSections \
            "`cat`"
    echo "$ORIGIN/wiki/MediaWiki:NewcomerTasks.json"
    echo "$ORIGIN/w/index.php?title=MediaWiki:NewcomerTasks.json&diff=next"
    echo "Press <Enter> to continue"
    read # give time for manual verification
done

I checked the configuration and it seemed to be correctly updated in all wikis. The only worth mention is for tumwiki which didn't get any excluded section on its config.

Sgs changed the task status from Open to In Progress.Feb 24 2023, 11:58 AM
Sgs updated the task description. (Show Details)

Change 892363 had a related patch set uploaded (by Sergio Gimeno; author: Sergio Gimeno):

[operations/mediawiki-config@master] GrowthExperiments: Enable link recommendation for 7th round wikis

https://gerrit.wikimedia.org/r/892363

Change 892363 merged by jenkins-bot:

[operations/mediawiki-config@master] GrowthExperiments: Enable backend of link recommendation for 7, 8, 9th round wikis

https://gerrit.wikimedia.org/r/892363

Mentioned in SAL (#wikimedia-operations) [2023-03-15T20:13:23Z] <samtar@deploy2002> Started scap: Backport for [[gerrit:899673|GrowthExperiments: enable frontend of link recommendation for 6th round wikis (T304550)]], [[gerrit:892363|GrowthExperiments: Enable backend of link recommendation for 7, 8, 9th round wikis (T304551 T308133 T308134)]]

Mentioned in SAL (#wikimedia-operations) [2023-03-15T20:14:55Z] <samtar@deploy2002> sgimeno and samtar: Backport for [[gerrit:899673|GrowthExperiments: enable frontend of link recommendation for 6th round wikis (T304550)]], [[gerrit:892363|GrowthExperiments: Enable backend of link recommendation for 7, 8, 9th round wikis (T304551 T308133 T308134)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet

Mentioned in SAL (#wikimedia-operations) [2023-03-15T20:23:36Z] <samtar@deploy2002> Finished scap: Backport for [[gerrit:899673|GrowthExperiments: enable frontend of link recommendation for 6th round wikis (T304550)]], [[gerrit:892363|GrowthExperiments: Enable backend of link recommendation for 7, 8, 9th round wikis (T304551 T308133 T308134)]] (duration: 10m 12s)

Sgs updated the task description. (Show Details)
Sgs added a subscriber: Sgs.

All models work, except:

  • diq.wp returns "There were no results matching the query."
  • dv.wp returns "There were no results matching the query."
  • ext.wp returns "There were no results matching the query."

All models work, except:

  • diq.wp returns "There were no results matching the query."
  • dv.wp returns "There were no results matching the query."

Per @kevinbazira comment above it seems these two wikis have been red-flagged. I also missed this on the configuration step so I will rollback the change there.

  • ext.wp returns "There were no results matching the query."

The dataset seems correctly exported and it appears in the wikis.txt file. I'm investigating this.

Change 902131 had a related patch set uploaded (by Sergio Gimeno; author: Sergio Gimeno):

[operations/mediawiki-config@master] GrowthExperiments: disable add a link backend

https://gerrit.wikimedia.org/r/902131

Per @kevinbazira comment above it seems these two wikis have been red-flagged. I also missed this on the configuration step so I will rollback the change there.

Got it. As they were not strikethrough, I tested them. :)

Change 902131 merged by jenkins-bot:

[operations/mediawiki-config@master] GrowthExperiments: disable add a link backend

https://gerrit.wikimedia.org/r/902131

Mentioned in SAL (#wikimedia-operations) [2023-03-23T13:28:16Z] <samtar@deploy2002> Started scap: Backport for [[gerrit:902131|GrowthExperiments: disable add a link backend (T304551)]]

Mentioned in SAL (#wikimedia-operations) [2023-03-23T13:29:49Z] <samtar@deploy2002> samtar and sgimeno: Backport for [[gerrit:902131|GrowthExperiments: disable add a link backend (T304551)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet

Mentioned in SAL (#wikimedia-operations) [2023-03-23T13:36:22Z] <samtar@deploy2002> Finished scap: Backport for [[gerrit:902131|GrowthExperiments: disable add a link backend (T304551)]] (duration: 08m 05s)

Can we schedule a release date for these wikis?
Can it be next week (Wed April 5)?

Can we schedule a release date for these wikis?
Can it be next week (Wed April 5)?

Sure. I've scheduled the deploy for today at 15h UTC+2. Are communities already informed?

Change 905950 had a related patch set uploaded (by Sergio Gimeno; author: Sergio Gimeno):

[operations/mediawiki-config@master] GrowthExperiments: enable add link frontend and backend

https://gerrit.wikimedia.org/r/905950

Communities haven't yet been informed as I was waiting for your reply. One week passed since I suggested the date. :)

We have to reschedule it for next week. Is Wed April 12 possible?

Communities haven't yet been informed as I was waiting for your reply. One week passed since I suggested the date. :)

We have to reschedule it for next week. Is Wed April 12 possible?

Sure, no problem.

Change 907899 had a related patch set uploaded (by Sergio Gimeno; author: Sergio Gimeno):

[operations/mediawiki-config@master] GrowthExperiments: enable add link frontend in 7th round wikis

https://gerrit.wikimedia.org/r/907899

Change 907899 merged by jenkins-bot:

[operations/mediawiki-config@master] GrowthExperiments: enable add link frontend in 7,8th round wikis

https://gerrit.wikimedia.org/r/907899

Mentioned in SAL (#wikimedia-operations) [2023-04-12T13:07:11Z] <lucaswerkmeister-wmde@deploy2002> Started scap: Backport for [[gerrit:907899|GrowthExperiments: enable add link frontend in 7,8th round wikis (T304551 T308133)]]

Mentioned in SAL (#wikimedia-operations) [2023-04-12T13:08:33Z] <lucaswerkmeister-wmde@deploy2002> sgimeno and lucaswerkmeister-wmde: Backport for [[gerrit:907899|GrowthExperiments: enable add link frontend in 7,8th round wikis (T304551 T308133)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet

Mentioned in SAL (#wikimedia-operations) [2023-04-12T13:20:42Z] <lucaswerkmeister-wmde@deploy2002> Finished scap: Backport for [[gerrit:907899|GrowthExperiments: enable add link frontend in 7,8th round wikis (T304551 T308133)]] (duration: 13m 30s)

Sgs changed the task status from In Progress to Open.Apr 12 2023, 5:18 PM
Sgs updated the task description. (Show Details)
Sgs moved this task from In Progress to QA on the Growth-Team (Sprint 0 (Growth Team)) board.
Etonkovidova added a subscriber: Etonkovidova.

Checked tumwiki, elwiki, and dawiki - "Add a link" feature seem to be working as expected; no issues found.