Page MenuHomePhabricator

Deploy "add a link" to 10th round of wikis
Closed, ResolvedPublic

Description

Event Timeline

18/19 models were trained successfully in the 10th round of wikis.

The Kyrgyz Wikipedia (kywiki) pipeline did not complete successfully and is being investigated in T329817.

Model evaluation has been completed and below are the backtesting results:

Precision@0.5Recall@0.5
kawiki0.820.34
kaawiki0.770.34
kabwiki0.820.65
kbdwiki0.800.41
kbpwiki0.880.60
kgwiki0.960.81
kiwiki0.960.89
kkwiki0.850.41
klwiki0.740.35
kmwiki0.700.21
knwiki0.790.22
koiwiki0.940.13
krcwiki0.650.20
kswiki0.980.83
kshwiki0.810.52
kuwiki0.880.40
kvwiki0.820.38
kwwiki0.830.53

CCing @MGerlach, in case he would like to add comments on the backtesting evaluation.

The conclusion on the backtesting results is that most of the languages look fine besides:

  • klwiki (0.74), and kmwiki (0.70) have a precision that is slightly lower than the recommended one (0.75).
  • krcwiki has a low precision (0.65).
  • koiwiki has a low recall (0.13).

Talked to @MGerlach about these results and he said:

Here I think kmwiki (0.7) and krcwiki (0.65) are borderline, especially krcwiki. The low recall for koiwiki is not so problematic because it only indicates that there might not be that many recommendations we can generate (though with precision of 0.94 it seems those that we can generate are pretty good). Overall, I would probably caution against the krcwiki the others seem fine.

@kostajh, we published datasets for all 17/19 models that passed the evaluation in this round.

@kostajh, we published datasets for all 17/19 models that passed the evaluation in this round.

Thanks! cc @Sgs, in case you want to incorporate this into the deployment work you're doing.

@kostajh, we published datasets for all 17/19 models that passed the evaluation in this round.

From comments in T308135#8632750 I understand kowiki is not problematic to deploy and its dataset can be used. That gives me a count of 18/19, since kywiki has been moved to T308136. Is that correct @kevinbazira?

I ran this script for adding the link-recommendation task type and populating the excluded sections entries:

for WIKI in kawiki kaawiki kabwiki kbdwiki kbpwiki kgwiki kiwiki kkwiki klwiki kmwiki knwiki kswiki kshwiki kuwiki kvwiki kwwiki; do
    ORIGIN=`mwscript getConfiguration.php $WIKI --settings 'wgCanonicalServer' --format json | jq --raw-output '.wgCanonicalServer'`
    mwscript extensions/GrowthExperiments/maintenance/changeWikiConfig.php $WIKI \
            --page MediaWiki:NewcomerTasks.json \
            --create-only \
            --json \
            --summary "Growth features configuration boilerplate ([[phab:$PHAB]])" \
            link-recommendation \
            '{ "type": "link-recommendation", "group": "easy" }'
    jq "select(.wiki==\"$WIKI\" and .probability > 0.25) | .section" wiki_sections.jsonl \
        | jq --slurp --compact-output "unique" \
        | mwscript extensions/GrowthExperiments/maintenance/changeWikiConfig.php $WIKI \
            --page MediaWiki:NewcomerTasks.json \
            --json \
            --summary "machine-generated configuration for excluding sections from link recommendations ([[phab:$PHAB]]), feel free to improve" \
            link-recommendation.excludedSections \
            "`cat`"
    echo "$ORIGIN/wiki/MediaWiki:NewcomerTasks.json"
    echo "$ORIGIN/w/index.php?title=MediaWiki:NewcomerTasks.json&diff=next"
    echo "Press <Enter> to continue"
    read # give time for manual verification
done

Waiting for confirmation on koiwiki. Some observations after running the script:

  • kabwiki has only one excluded section in English
  • kbpwiki has no excluded sections and no other newcomer tasks enabled
  • kgwiki has only excluded sections in French
  • kiwiki has only excluded sections in English
  • klwiki has only excluded sections in English
Sgs changed the task status from Open to In Progress.Jul 5 2023, 10:39 AM

@kostajh, we published datasets for all 17/19 models that passed the evaluation in this round.

From comments in T308135#8632750 I understand kowiki is not problematic to deploy and its dataset can be used. That gives me a count of 18/19, since kywiki has been moved to T308136. Is that correct @kevinbazira?

@Sgs, yes koiwiki's dataset can be used. Regarding kywiki, 17/19 models were published in this round because kywiki's training pipeline did not complete successfully in the 10th round, the bug that caused this issue was fixed in T329817#8635930 then kywiki was added to the 11th round where: its training pipeline run successfully; passed the backtesting evaluation; and got published here:
https://analytics.wikimedia.org/published/datasets/one-off/research-mwaddlink/kywiki/

@kevinbazira thank you. I updated the configuration for koiwiki as well.

Change 935723 had a related patch set uploaded (by Sergio Gimeno; author: Sergio Gimeno):

[operations/mediawiki-config@master] GrowthExperiments: Enable backend of link recommendation 10th round wikis

https://gerrit.wikimedia.org/r/935723

Change 935723 merged by jenkins-bot:

[operations/mediawiki-config@master] GrowthExperiments: Enable backend of link recommendation 10, 11, 12th round wikis

https://gerrit.wikimedia.org/r/935723

Mentioned in SAL (#wikimedia-operations) [2023-07-11T13:03:28Z] <urbanecm@deploy1002> Started scap: Backport for [[gerrit:935723|GrowthExperiments: Enable backend of link recommendation 10, 11, 12th round wikis (T308135 T308136 T308137)]]

Mentioned in SAL (#wikimedia-operations) [2023-07-11T13:04:58Z] <urbanecm@deploy1002> sgimeno and urbanecm: Backport for [[gerrit:935723|GrowthExperiments: Enable backend of link recommendation 10, 11, 12th round wikis (T308135 T308136 T308137)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet

Mentioned in SAL (#wikimedia-operations) [2023-07-11T13:13:13Z] <urbanecm@deploy1002> Finished scap: Backport for [[gerrit:935723|GrowthExperiments: Enable backend of link recommendation 10, 11, 12th round wikis (T308135 T308136 T308137)]] (duration: 09m 45s)

Change 940347 had a related patch set uploaded (by Sergio Gimeno; author: Sergio Gimeno):

[operations/mediawiki-config@master] GrowthExperiments: enable AddLink task frontend in 10th round of wikis

https://gerrit.wikimedia.org/r/940347

@Trizek-WMF I can confirm all the wikis from this round have produced abundant results now. Including koiwiki and kywiki which were prior discarded from the round because of model pipeline issues. However there are two wikis which have generated a very low number of results:

I think this round is ready for announcement and frontend enabling (aside from kgwiki and klwiki, which are maybe ok as well). I'm OoO next two weeks, I've left the configuration change (940347) ready to backport so another engineer can take on, in case you want to progress the task before I'm back.

Everything works except :

We will ignore them for now. We can deploy all other wikis.

Trizek-WMF set Due Date to Aug 2 2023, 10:00 AM.

Thanks @Trizek-WMF, updated the final deployment patch to reflect that.

Urbanecm_WMF changed Due Date from Aug 2 2023, 10:00 AM to Aug 1 2023, 10:00 AM.Jul 27 2023, 2:40 PM

I'll do the deployment on Tuesday.

Change 940347 merged by jenkins-bot:

[operations/mediawiki-config@master] GrowthExperiments: enable AddLink task frontend in 10th round of wikis

https://gerrit.wikimedia.org/r/940347

Mentioned in SAL (#wikimedia-operations) [2023-08-01T08:22:38Z] <urbanecm@deploy1002> Started scap: Backport for [[gerrit:940347|GrowthExperiments: enable AddLink task frontend in 10th round of wikis (T308135)]]

Mentioned in SAL (#wikimedia-operations) [2023-08-01T08:24:20Z] <urbanecm@deploy1002> sgimeno and urbanecm: Backport for [[gerrit:940347|GrowthExperiments: enable AddLink task frontend in 10th round of wikis (T308135)]] synced to the testservers mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)

Mentioned in SAL (#wikimedia-operations) [2023-08-01T08:33:30Z] <urbanecm@deploy1002> Finished scap: Backport for [[gerrit:940347|GrowthExperiments: enable AddLink task frontend in 10th round of wikis (T308135)]] (duration: 10m 52s)

Urbanecm_WMF changed the task status from In Progress to Open.Aug 1 2023, 8:34 AM
Etonkovidova subscribed.

Everything works except :

We will ignore them for now. We can deploy all other wikis.

Re-checked:

kgwiki - 7 resultsSpecial:NewcomerTasksInfo lists 7 tasks for link-recommendationSpecial:Homepage doesn't display any task types available
Screen Shot 2023-08-25 at 2.24.30 PM.png (1×1 px, 201 KB)
klwiki - 2 resultsSpecial:NewcomerTasksInfo expand 66 and link-recommendation2Special:Homepage displays only expand task type
Screen Shot 2023-08-25 at 2.30.18 PM.png (1×1 px, 157 KB)
krcwiki - no results matching the querySpecial:NewcomerTasksInfo No data is available.