Page MenuHomePhabricator

Bump threshold for confidence score on link recommendation service suggestions
Closed, ResolvedPublic

Description

The default threshold for generating a link suggestion is 0.5. We can consider raising this to 0.6 or 0.7. That would have the following effects:

  • The suggestions presented to the end user will have a higher likelihood of being good quality links, and will be less likely to be reverted.
  • For each article, the link recommendation service will identify fewer phrases as link suggestions (e.g. instead of 5 phrases, it might find 1 or 2).
    • It's hard to say how many fewer suggestions we would get per article. If we wanted to find out, we could write a fairly straightforward script to iterate over cached link recommendations in the database and gather statistics about the confidence score for each suggestion.
  • Because we have a minimum threshold of two suggestions for an article to be considered as a candidate link recommendation task, there will be fewer articles in the task queue, and/or it will take longer to repopulate the task queue for each wiki.

Acceptance Criteria

  • The threshold for link suggestions is set at a higher value: 0.6
  • Run revalidateLinkRecommendations.php on the affected wikis

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

We would need to run revalidateLinkRecommendations.php on the affected wikis, otherwise it will take forever for the change to take effect. We should probably add a validation option to the script where it checks the "cheap" tasktype properties (link score, min links per task), maybe even updates the recommendation by filtering out below-threshold links if there enough links to do that.

@KStoller-WMF is this task something that we should prioritize doing in the next week or two?

It's not urgent, but I agree this is a task that we should work on soon.

Change 832639 had a related patch set uploaded (by Kosta Harlan; author: Kosta Harlan):

[mediawiki/extensions/GrowthExperiments@master] LinkRecommendationTaskType: Raise score threshold to 0.6

https://gerrit.wikimedia.org/r/832639

Change 832639 merged by jenkins-bot:

[mediawiki/extensions/GrowthExperiments@master] LinkRecommendationTaskType: Raise score threshold to 0.6

https://gerrit.wikimedia.org/r/832639

Etonkovidova closed this task as Resolved.EditedJan 9 2023, 11:03 PM
Etonkovidova subscribed.

For quite few deployments no regression was noticed in regards to pool size of suggested links and any other regression issues. I looked at several wikis at e.g. https://grafana.wikimedia.org/d/vGq7hbnMz/special-homepage-and-suggested-edits?orgId=1&from=now-30d&to=now&viewPanel=31 - there are some wikis that have declining pool size but the count of tasks is still sufficiently high; and several wikis have recovered their drop in the number of tasks.

Don't we want to run the revalidate script, though? Some (maybe most) tasks probably still have the old confidence score.

Don't we want to run the revalidate script, though? Some (maybe most) tasks probably still have the old confidence score.

Yeah, that was the second checkmark in the task description.

I would also be interested in a maintenance script that could pull the cached entries and provide some aggregate data to us about the metadata for the cache entries, like the confidence score, the dataset used to generate the recommendation, number of links, etc.

Run revalidateLinkRecommendations.php on the affected wikis

@kostajh - Should we still complete this last task?

This is the last lingering task for a Epic we should officially resolve: T315732: [EPIC] Structured Tasks: Patroller Focus.

Don't we want to run the revalidate script, though? Some (maybe most) tasks probably still have the old confidence score.

Yeah, that was the second checkmark in the task description.

I would also be interested in a maintenance script that could pull the cached entries and provide some aggregate data to us about the metadata for the cache entries, like the confidence score, the dataset used to generate the recommendation, number of links, etc.

For the SLI discussion (T278083: Define SLIs/SLOs for link recommendation service) we'd like to have a maintenance script that can:

  • collect statistics on age of cached entries and emit this to Grafana
  • track which dataset IDs are used and emit to Grafana
  • track which newcomertasks.json revision ID was used for config and emit to Grafana

That would provide additional useful data to SRE in determining if the service is not working quickly enough.

We could also consider tracking the rate of link recommendation task completion / task pool size, with the idea that this line should be fairly constant across each wiki, but that probably deserves a separate task.

Run revalidateLinkRecommendations.php on the affected wikis

@kostajh - Should we still complete this last task?

Yes, I think so, but IMHO we should first write a maintenance script to analyze the cached link recommendation contents, which we can then re-use for improved monitoring in T278083: Define SLIs/SLOs for link recommendation service.

Run revalidateLinkRecommendations.php on the affected wikis

@kostajh - Should we still complete this last task?

Yes, I think so, but IMHO we should first write a maintenance script to analyze the cached link recommendation contents, which we can then re-use for improved monitoring in T278083: Define SLIs/SLOs for link recommendation service.

Do we need to create a Phab task for writing a maintenance script?

If possible we should try to wrap up this task soon, or admit we can't fit it in and move it out of the current sprint.

Change 948671 had a related patch set uploaded (by Urbanecm; author: Urbanecm):

[mediawiki/extensions/GrowthExperiments@master] revalidateLinkRecommendations: Make it possible to revalidate based on score

https://gerrit.wikimedia.org/r/948671

Urbanecm_WMF moved this task from QA to Code Review on the Growth-Team (Sprint 0 (Growth Team)) board.
Urbanecm_WMF subscribed.

As suggested by @Tgr in Slack, I did an one-off analysis on stat1005 regarding the arwiki task pool (and how many of the tasks meet the current 0.6 criteria):

from wmfdata import mariadb
import json

df = mariadb.run('''
SELECT
    gelr_revision,
    JSON_EXTRACT(gelr_data, '$**.score') AS scores
FROM growthexperiments_link_recommendations
''', 'arwiki', use_x1=True)

df['scores'] = df.scores.apply(lambda x: json.loads(x.decode('utf-8')))
df['max_score'] = df.scores.apply(lambda x: min(x))
df['meets_0.6'] = df.max_score.apply(lambda x: x >= 0.6)

df[['meets_0.6', 'gelr_revision']].groupby('meets_0.6').count()

The results (for arwiki):

meets_0.6gelr_revision
False814
True326

This means only ~28% of suggestions are all acceptable (the analysis used the minimum link score, so more suggestions might have an acceptable link; I focused on suggestions with all links meeting the .6 threshold). FTR, on other pilots, the numbers are much more favourable (more suggestions meet the threshold).

Let's do the revalidation then. Uploaded a patch to make it possible in a targeted way (revalidating everything is possible, but time expensive).

Change 948671 merged by jenkins-bot:

[mediawiki/extensions/GrowthExperiments@master] revalidateLinkRecommendations: Make it possible to revalidate based on score

https://gerrit.wikimedia.org/r/948671

Change 949576 had a related patch set uploaded (by Urbanecm; author: Urbanecm):

[mediawiki/extensions/GrowthExperiments@wmf/1.41.0-wmf.22] revalidateLinkRecommendations: Make it possible to revalidate based on score

https://gerrit.wikimedia.org/r/949576

Change 949577 had a related patch set uploaded (by Urbanecm; author: Urbanecm):

[mediawiki/extensions/GrowthExperiments@wmf/1.41.0-wmf.20] revalidateLinkRecommendations: Make it possible to revalidate based on score

https://gerrit.wikimedia.org/r/949577

Change 949577 merged by jenkins-bot:

[mediawiki/extensions/GrowthExperiments@wmf/1.41.0-wmf.20] revalidateLinkRecommendations: Make it possible to revalidate based on score

https://gerrit.wikimedia.org/r/949577

Change 949576 merged by jenkins-bot:

[mediawiki/extensions/GrowthExperiments@wmf/1.41.0-wmf.22] revalidateLinkRecommendations: Make it possible to revalidate based on score

https://gerrit.wikimedia.org/r/949576

Mentioned in SAL (#wikimedia-operations) [2023-08-17T13:23:05Z] <urbanecm@deploy1002> Started scap: Backport for [[gerrit:949582|cross-wiki userrights: Add SpecialUserRights::getDisplayUsername (T344391 T255309)]], [[gerrit:949577|revalidateLinkRecommendations: Make it possible to revalidate based on score (T316079)]], [[gerrit:949576|revalidateLinkRecommendations: Make it possible to revalidate based on score (T316079)]]

Mentioned in SAL (#wikimedia-operations) [2023-08-17T13:23:55Z] <urbanecm@deploy1002> urbanecm: Backport for [[gerrit:949582|cross-wiki userrights: Add SpecialUserRights::getDisplayUsername (T344391 T255309)]], [[gerrit:949577|revalidateLinkRecommendations: Make it possible to revalidate based on score (T316079)]], [[gerrit:949576|revalidateLinkRecommendations: Make it possible to revalidate based on score (T316079)]] synced to the testservers mwdebug2001.codfw.wmnet, mwdebug2002.c

Mentioned in SAL (#wikimedia-operations) [2023-08-17T13:28:52Z] <urbanecm@deploy1002> Finished scap: Backport for [[gerrit:949582|cross-wiki userrights: Add SpecialUserRights::getDisplayUsername (T344391 T255309)]], [[gerrit:949577|revalidateLinkRecommendations: Make it possible to revalidate based on score (T316079)]], [[gerrit:949576|revalidateLinkRecommendations: Make it possible to revalidate based on score (T316079)]] (duration: 05m 46s)

Change 949988 had a related patch set uploaded (by Urbanecm; author: Urbanecm):

[operations/mediawiki-config@master] Growth: Temporarily disable link-recommendation FE on arwiki

https://gerrit.wikimedia.org/r/949988

Change 949988 merged by jenkins-bot:

[operations/mediawiki-config@master] Growth: Temporarily disable link-recommendation FE on arwiki

https://gerrit.wikimedia.org/r/949988

Mentioned in SAL (#wikimedia-operations) [2023-08-17T15:02:47Z] <urbanecm@deploy1002> Started scap: Backport for [[gerrit:949568|suwikisource remove NamespaceAliases and ExtraNamespaces for Page and Index namespace (T344314)]], [[gerrit:949988|Growth: Temporarily disable link-recommendation FE on arwiki (T316079)]]

Mentioned in SAL (#wikimedia-operations) [2023-08-17T15:04:39Z] <urbanecm@deploy1002> urbanecm and anzx: Backport for [[gerrit:949568|suwikisource remove NamespaceAliases and ExtraNamespaces for Page and Index namespace (T344314)]], [[gerrit:949988|Growth: Temporarily disable link-recommendation FE on arwiki (T316079)]] synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (ac

Mentioned in SAL (#wikimedia-operations) [2023-08-17T15:17:43Z] <urbanecm@deploy1002> Finished scap: Backport for [[gerrit:949568|suwikisource remove NamespaceAliases and ExtraNamespaces for Page and Index namespace (T344314)]], [[gerrit:949988|Growth: Temporarily disable link-recommendation FE on arwiki (T316079)]] (duration: 14m 56s)

Change 949990 had a related patch set uploaded (by Urbanecm; author: Urbanecm):

[mediawiki/extensions/GrowthExperiments@master] revalidateLinkRecommendations: Load scoreLessThan correctly

https://gerrit.wikimedia.org/r/949990

revalidateLinkRecommendations is completed on arwiki now -- all tasks now meet the new 0.6 threshold. I left a follow-up patch that can be merged any-time, and after that, we can consider this resolved.

Change 949585 had a related patch set uploaded (by Urbanecm; author: Urbanecm):

[operations/mediawiki-config@master] Revert "Growth: Temporarily disable link-recommendation FE on arwiki"

https://gerrit.wikimedia.org/r/949585

Change 949585 merged by jenkins-bot:

[operations/mediawiki-config@master] Revert "Growth: Temporarily disable link-recommendation FE on arwiki"

https://gerrit.wikimedia.org/r/949585

Mentioned in SAL (#wikimedia-operations) [2023-08-21T11:23:42Z] <urbanecm@deploy1002> Started scap: Backport for [[gerrit:949585|Revert "Growth: Temporarily disable link-recommendation FE on arwiki" (T316079)]]

Mentioned in SAL (#wikimedia-operations) [2023-08-21T11:25:15Z] <urbanecm@deploy1002> urbanecm: Backport for [[gerrit:949585|Revert "Growth: Temporarily disable link-recommendation FE on arwiki" (T316079)]] synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)

Mentioned in SAL (#wikimedia-operations) [2023-08-21T11:32:25Z] <urbanecm@deploy1002> Finished scap: Backport for [[gerrit:949585|Revert "Growth: Temporarily disable link-recommendation FE on arwiki" (T316079)]] (duration: 08m 42s)

Change 949990 merged by jenkins-bot:

[mediawiki/extensions/GrowthExperiments@master] revalidateLinkRecommendations: Load scoreLessThan correctly

https://gerrit.wikimedia.org/r/949990

Change 950812 had a related patch set uploaded (by Urbanecm; author: Urbanecm):

[mediawiki/extensions/GrowthExperiments@wmf/1.41.0-wmf.22] revalidateLinkRecommendations: Load scoreLessThan correctly

https://gerrit.wikimedia.org/r/950812

Change 950812 merged by jenkins-bot:

[mediawiki/extensions/GrowthExperiments@wmf/1.41.0-wmf.22] revalidateLinkRecommendations: Load scoreLessThan correctly

https://gerrit.wikimedia.org/r/950812

Mentioned in SAL (#wikimedia-operations) [2023-08-21T20:21:27Z] <urbanecm@deploy1002> Started scap: Backport for [[gerrit:951151|Growth: Remove wgWelcomeSurveyEnableWithHomepage (T342353 T344619)]], [[gerrit:950812|revalidateLinkRecommendations: Load scoreLessThan correctly (T316079)]], [[gerrit:950813|LinkRecommendationUpdater: Load link-recommendation even if disabled (T344343)]]

Mentioned in SAL (#wikimedia-operations) [2023-08-21T20:23:01Z] <urbanecm@deploy1002> urbanecm: Backport for [[gerrit:951151|Growth: Remove wgWelcomeSurveyEnableWithHomepage (T342353 T344619)]], [[gerrit:950812|revalidateLinkRecommendations: Load scoreLessThan correctly (T316079)]], [[gerrit:950813|LinkRecommendationUpdater: Load link-recommendation even if disabled (T344343)]] synced to the testservers mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwde

Mentioned in SAL (#wikimedia-operations) [2023-08-21T20:32:29Z] <urbanecm@deploy1002> Finished scap: Backport for [[gerrit:951151|Growth: Remove wgWelcomeSurveyEnableWithHomepage (T342353 T344619)]], [[gerrit:950812|revalidateLinkRecommendations: Load scoreLessThan correctly (T316079)]], [[gerrit:950813|LinkRecommendationUpdater: Load link-recommendation even if disabled (T344343)]] (duration: 11m 02s)

Mentioned in SAL (#wikimedia-operations) [2023-08-22T12:46:40Z] <urbanecm> mwmaint1002: foreachwikiindblist growthexperiments extensions/GrowthExperiments/maintenance/revalidateLinkRecommendations.php --scoreLessThan=0.6 --verbose | tee growth-T316079-revalidate-0.6.log # T316079

Mentioned in SAL (#wikimedia-operations) [2023-08-22T12:46:40Z] <urbanecm> mwmaint1002: foreachwikiindblist growthexperiments extensions/GrowthExperiments/maintenance/revalidateLinkRecommendations.php --scoreLessThan=0.6 --verbose | tee growth-T316079-revalidate-0.6.log # T316079

I've started the revalidation for all the wikis, to ensure the 0.6 threshold is met everywhere. I already ran it manually on most of the bigger wikis, but this should be done on the remainder as well, just in case.