Page MenuHomePhabricator

Add a link: too many articles have no suggestions upon arrival
Closed, ResolvedPublic

Description

In ambassador testing in production and on Test Wikipedia, we've noticed that a high percentage of suggested articles end up having no suggestions available once the user arrives on them. It sounds like this might be 10-20%, which is much too high. We would want this experience to happen 1% of the time or less.

Details

SubjectRepoBranchLines +/-
mediawiki/extensions/GrowthExperimentswmf/1.37.0-wmf.18+183 -16
mediawiki/extensions/GrowthExperimentsmaster+183 -16
mediawiki/extensions/GrowthExperimentsmaster+13 -15
mediawiki/extensions/GrowthExperimentswmf/1.37.0-wmf.7+6 -3
mediawiki/extensions/GrowthExperimentswmf/1.37.0-wmf.6+6 -3
mediawiki/extensions/GrowthExperimentswmf/1.37.0-wmf.6+16 -5
mediawiki/extensions/GrowthExperimentswmf/1.37.0-wmf.7+16 -5
mediawiki/extensions/GrowthExperimentsmaster+14 -9
mediawiki/extensions/GrowthExperimentsmaster+16 -5
mediawiki/extensions/GrowthExperimentswmf/1.37.0-wmf.7+6 -2
mediawiki/extensions/GrowthExperimentswmf/1.37.0-wmf.6+6 -2
mediawiki/extensions/GrowthExperimentsmaster+6 -3
mediawiki/extensions/GrowthExperimentsmaster+6 -2
mediawiki/extensions/GrowthExperimentsmaster+4 -2
mediawiki/extensions/GrowthExperimentswmf/1.37.0-wmf.6+4 -2
mediawiki/extensions/GrowthExperimentsmaster+10 -5
mediawiki/extensions/GrowthExperimentswmf/1.37.0-wmf.7+4 -2
Show related patches Customize query in gerrit

Event Timeline

Change 695191 had a related patch set uploaded (by Gergő Tisza; author: Gergő Tisza):

[mediawiki/extensions/GrowthExperiments@master] Allow running fixLinkRecommendationData --search-index in production

https://gerrit.wikimedia.org/r/695191

Change 695193 had a related patch set uploaded (by Gergő Tisza; author: Gergő Tisza):

[mediawiki/extensions/GrowthExperiments@master] Add --dry-run option to fixLinkRecommendationData.php

https://gerrit.wikimedia.org/r/695193

Change 695231 had a related patch set uploaded (by Gergő Tisza; author: Gergő Tisza):

[mediawiki/extensions/GrowthExperiments@master] Always delete from search index in AddLinkSubmissionHandler

https://gerrit.wikimedia.org/r/695231

Change 695044 had a related patch set uploaded (by Kosta Harlan; author: Gergő Tisza):

[mediawiki/extensions/GrowthExperiments@wmf/1.37.0-wmf.6] Allow running fixLinkRecommendationData --search-index in production

https://gerrit.wikimedia.org/r/695044

Change 695045 had a related patch set uploaded (by Kosta Harlan; author: Gergő Tisza):

[mediawiki/extensions/GrowthExperiments@wmf/1.37.0-wmf.7] Allow running fixLinkRecommendationData --search-index in production

https://gerrit.wikimedia.org/r/695045

Change 695191 merged by jenkins-bot:

[mediawiki/extensions/GrowthExperiments@master] Allow running fixLinkRecommendationData --search-index in production

https://gerrit.wikimedia.org/r/695191

Change 695193 merged by jenkins-bot:

[mediawiki/extensions/GrowthExperiments@master] Add --dry-run option to fixLinkRecommendationData.php

https://gerrit.wikimedia.org/r/695193

Change 695231 merged by jenkins-bot:

[mediawiki/extensions/GrowthExperiments@master] Always delete from search index in AddLinkSubmissionHandler

https://gerrit.wikimedia.org/r/695231

Change 695045 merged by jenkins-bot:

[mediawiki/extensions/GrowthExperiments@wmf/1.37.0-wmf.7] Allow running fixLinkRecommendationData --search-index in production

https://gerrit.wikimedia.org/r/695045

Mentioned in SAL (#wikimedia-operations) [2021-05-26T11:39:48Z] <urbanecm@deploy1002> Synchronized php-1.37.0-wmf.7/extensions/GrowthExperiments/maintenance/fixLinkRecommendationData.php: 86bba48: Allow running fixLinkRecommendationData --search-index in production (T283606) (duration: 01m 06s)

Change 695044 merged by jenkins-bot:

[mediawiki/extensions/GrowthExperiments@wmf/1.37.0-wmf.6] Allow running fixLinkRecommendationData --search-index in production

https://gerrit.wikimedia.org/r/695044

Mentioned in SAL (#wikimedia-operations) [2021-05-26T11:44:04Z] <urbanecm@deploy1002> Synchronized php-1.37.0-wmf.6/extensions/GrowthExperiments/maintenance/fixLinkRecommendationData.php: b3c2941: Allow running fixLinkRecommendationData --search-index in production (T283606) (duration: 01m 07s)

Change 695292 had a related patch set uploaded (by Gergő Tisza; author: Gergő Tisza):

[mediawiki/extensions/GrowthExperiments@master] fixLinkRecommendationData.php: also fix search index for old DB entries

https://gerrit.wikimedia.org/r/695292

I think this is done (thanks for figuring out the root of the problem @kostajh!), other than running the fixer script for the production wikis. I wonder if we'd want to track no suggestions rate in an easy-to-monitor way though? We do record an EventGate event, not sure if there's an easy way to chart those though.

Current search index sizes: ar 16K, bn 9K, cs 20K, vi 14K, testwiki 750.
Current DB sizes: ar 23K, bn 16K, cs 20K, vi 21K, testwiki 500.

Probably most of the difference is due to T261407#7088905, not this issue, though.

Change 695478 had a related patch set uploaded (by MewOphaswongse; author: Gergő Tisza):

[mediawiki/extensions/GrowthExperiments@wmf/1.37.0-wmf.6] Always delete from search index in AddLinkSubmissionHandler

https://gerrit.wikimedia.org/r/695478

Change 695479 had a related patch set uploaded (by MewOphaswongse; author: Gergő Tisza):

[mediawiki/extensions/GrowthExperiments@wmf/1.37.0-wmf.7] Always delete from search index in AddLinkSubmissionHandler

https://gerrit.wikimedia.org/r/695479

Mentioned in SAL (#wikimedia-operations) [2021-05-26T21:22:21Z] <tgr> T283606: running mwscript extensions/GrowthExperiments/maintenance/fixLinkRecommendationData.php --wiki={ar,bn,cs,vi}wiki --verbose --search-index

Well, this didn't really work.

$ for wiki in arwiki bnwiki cswiki viwiki; do echo "== $wiki"; mwscript extensions/GrowthExperiments/maintenance/fixLinkRecommendationData.php --wiki=$wiki --verbose --search-index | grep 'Fixing'; done
== arwiki
    Fixing بيني دريدفول (مسلسل)
== bnwiki
    Fixing ক্যাপ্টেন মার্ভেল (চলচ্চিত্র)
    Fixing প্রাচ্যতত্ত্ব
    Fixing এরোমাঙ্গা সেনসেই
== cswiki
    Fixing Slabá interakce
    Fixing Souhvězdí Malého psa
    Fixing Phönix C.I
    Fixing Convair XP-81
    Fixing Accelerated Processing Unit
    Fixing Dolarová aukce
== viwiki
    Fixing Công nghệ môi trường
    Fixing Đoạn thẳng
    Fixing Sân bay quốc tế Cape Town
    Fixing Windows Preinstallation Environment
    Fixing Samba (phần mềm)
    Fixing Phân tích quang phổ
    Fixing Chi Lan hoa sâm
    Fixing Republic F-84F Thunderstreak
    Fixing Ung thư lưỡi
    Fixing Đại lượng vô hướng
    Fixing Bất đẳng thức Ky Fan
    Fixing Subaru

We'll need https://gerrit.wikimedia.org/r/c/mediawiki/extensions/GrowthExperiments/+/695292 I guess.

Also for two of the four wikis the script aborted with an error about how iterating through search results is capped at 10K results. That's a tougher problem. I guess we can split by topics / groups of topics?

Change 695292 merged by jenkins-bot:

[mediawiki/extensions/GrowthExperiments@master] fixLinkRecommendationData.php: also fix search index for old DB entries

https://gerrit.wikimedia.org/r/695292

Change 695837 had a related patch set uploaded (by Gergő Tisza; author: Gergő Tisza):

[mediawiki/extensions/GrowthExperiments@wmf/1.37.0-wmf.7] fixLinkRecommendationData.php: also fix search index for old DB entries

https://gerrit.wikimedia.org/r/695837

Change 695838 had a related patch set uploaded (by Gergő Tisza; author: Gergő Tisza):

[mediawiki/extensions/GrowthExperiments@wmf/1.37.0-wmf.6] fixLinkRecommendationData.php: also fix search index for old DB entries

https://gerrit.wikimedia.org/r/695838

Change 696307 had a related patch set uploaded (by Gergő Tisza; author: Gergő Tisza):

[mediawiki/extensions/GrowthExperiments@master] Fix Ie9a1018c198 for external cluster

https://gerrit.wikimedia.org/r/696307

Change 695478 merged by jenkins-bot:

[mediawiki/extensions/GrowthExperiments@wmf/1.37.0-wmf.6] Always delete from search index in AddLinkSubmissionHandler

https://gerrit.wikimedia.org/r/695478

Change 696307 merged by jenkins-bot:

[mediawiki/extensions/GrowthExperiments@master] Fix Ie9a1018c198 for external cluster

https://gerrit.wikimedia.org/r/696307

Change 695843 had a related patch set uploaded (by Gergő Tisza; author: Gergő Tisza):

[mediawiki/extensions/GrowthExperiments@wmf/1.37.0-wmf.6] Fix Ie9a1018c198 for external cluster

https://gerrit.wikimedia.org/r/695843

Change 695844 had a related patch set uploaded (by Gergő Tisza; author: Gergő Tisza):

[mediawiki/extensions/GrowthExperiments@wmf/1.37.0-wmf.7] Fix Ie9a1018c198 for external cluster

https://gerrit.wikimedia.org/r/695844

Change 695479 merged by jenkins-bot:

[mediawiki/extensions/GrowthExperiments@wmf/1.37.0-wmf.7] Always delete from search index in AddLinkSubmissionHandler

https://gerrit.wikimedia.org/r/695479

Mentioned in SAL (#wikimedia-operations) [2021-05-27T12:40:41Z] <tgr@deploy1002> Synchronized php-1.37.0-wmf.7/extensions/GrowthExperiments/: Backport: [[gerrit:695437|Add Link: Prevent double-opening of the post-edit dialog (T283120)]] [[gerrit:695479|Always delete from search index in AddLinkSubmissionHandler (T283606)]] (duration: 01m 06s)

Mentioned in SAL (#wikimedia-operations) [2021-05-27T12:55:43Z] <tgr> T283606: running mwscript extensions/GrowthExperiments/maintenance/fixLinkRecommendationData.php --wiki={ar,bn,cs,vi}wiki --verbose --search-index

Mentioned in SAL (#wikimedia-operations) [2021-05-27T12:57:22Z] <tgr> T283606: running mwscript extensions/GrowthExperiments/maintenance/fixLinkRecommendationData.php --wiki={ar,bn,cs,vi}wiki --verbose --search-index with gerrit:696307 applied

Fixed items: ar 1416, bn 1128, cs 2291, vi 1396. This is out of the first 10K items - except for bn, all have significantly more than that, so this doesn't fully solve the problem. We'll have to break the search up into multiple queries somehow.

Change 697580 had a related patch set uploaded (by Gergő Tisza; author: Gergő Tisza):

[mediawiki/extensions/GrowthExperiments@master] Add Link: Fix refreshLinkRecommendations.php counting logic

https://gerrit.wikimedia.org/r/697580

Change 697580 merged by jenkins-bot:

[mediawiki/extensions/GrowthExperiments@master] Add Link: Fix refreshLinkRecommendations.php counting logic

https://gerrit.wikimedia.org/r/697580

Change 695844 abandoned by Gergő Tisza:

[mediawiki/extensions/GrowthExperiments@wmf/1.37.0-wmf.7] Fix Ie9a1018c198 for external cluster

Reason:

relatively unimportant, we ended up not backporting it

https://gerrit.wikimedia.org/r/695844

Change 695843 abandoned by Gergő Tisza:

[mediawiki/extensions/GrowthExperiments@wmf/1.37.0-wmf.6] Fix Ie9a1018c198 for external cluster

Reason:

relatively unimportant, we ended up not backporting it

https://gerrit.wikimedia.org/r/695843

Change 695838 abandoned by Gergő Tisza:

[mediawiki/extensions/GrowthExperiments@wmf/1.37.0-wmf.6] fixLinkRecommendationData.php: also fix search index for old DB entries

Reason:

relatively unimportant, we ended up not backporting it

https://gerrit.wikimedia.org/r/695838

Change 695837 abandoned by Gergő Tisza:

[mediawiki/extensions/GrowthExperiments@wmf/1.37.0-wmf.7] fixLinkRecommendationData.php: also fix search index for old DB entries

Reason:

relatively unimportant, we ended up not backporting it

https://gerrit.wikimedia.org/r/695837

Tasks with missing index entries from the first 10K tasks:

  • arwiki: 356
  • bnwiki: 601
  • cswiki: 882
  • viwiki: 468

That's still way too high.

Change 697879 had a related patch set uploaded (by Gergő Tisza; author: Gergő Tisza):

[mediawiki/extensions/GrowthExperiments@master] Fix mw.errorLogger.logError calls

https://gerrit.wikimedia.org/r/697879

@Tgr shall we try to make this task a bit more specific, either about identifying the root cause(s) or documenting specific fixes we want to implement? In either case, this seems like more of an epic or parent task.

Change 697879 merged by jenkins-bot:

[mediawiki/extensions/GrowthExperiments@master] Fix mw.errorLogger.logError calls

https://gerrit.wikimedia.org/r/697879

$ for wiki in arwiki bnwiki cswiki viwiki; do echo -n "$wiki: "; mwscript `pwd`/fixLinkRecommendationData.php --wiki=$wiki --verbose --dry-run --search-index 2>/dev/null | ack 'Would fix' | wc -l; done
arwiki: 622
bnwiki: 783
cswiki: 1963
viwiki: 710

This was 5 days after T283606#7129169, which is six days after the last manual cleanup.

@Tgr shall we try to make this task a bit more specific, either about identifying the root cause(s) or documenting specific fixes we want to implement? In either case, this seems like more of an epic or parent task.

There are three avenues to pursue:

  • identify and fix the root cause(s) - obviously this would be the ideal solution
  • run the fixer script periodically as a cron job
  • do a DB query to filter out out missing entries in TaskSuggester::filter

There's also T283814: CirrusSearch::resetWeightedTags() is slow for speeding up the fixer script, plus the issue of search results being capped at 10K (filed as T284531).

Mentioned in SAL (#wikimedia-operations) [2021-06-08T06:52:41Z] <tgr> T283606: running mwscript extensions/GrowthExperiments/maintenance/fixLinkRecommendationData.php --wiki={ar,bn,cs,vi}wiki --verbose --search-index with gerrit:696307 applied

@Tgr in T283109: Add link: Edits on Add link articles are automatically removed by subsequent Add link edits I was looking more closely at onSearchDataForIndex method in HomepageHooks.php. If a user makes a regular visualeditor/wikitext edit to an article that has link suggestions, the onSearchDataForIndex calls $linkRecommendation = $this->linkRecommendationStore->getByLinkTarget( $page->getTitle() );. getByLinkTarget() uses the latest revision ID (first it tries the replica, then it tries the source DB) associated with the title. Since this hook is triggered in job execution, either with replica or source DB, it's possible that this revision ID is newer than what is stored in the database for the link suggestion. So, $linkRecommendation is null and a delete never happens. Maybe we should be obtaining the link recommendation by page ID, then checking if its revision ID is lower than what is passed to us in onSearchDataForIndex?

Good catch, thanks! Yeah, we want the latest recommendation, not necessarily the current one. LinkRecommendation::getByLinkTarget() is not an examplar of great signature design :(

Change 702482 had a related patch set uploaded (by Gergő Tisza; author: Gergő Tisza):

[mediawiki/extensions/GrowthExperiments@master] Add Link: fix invalidation on non-addlink edit

https://gerrit.wikimedia.org/r/702482

Change 702482 merged by jenkins-bot:

[mediawiki/extensions/GrowthExperiments@master] Add Link: fix invalidation on non-addlink edit

https://gerrit.wikimedia.org/r/702482

Change 711719 had a related patch set uploaded (by Gergő Tisza; author: Gergő Tisza):

[mediawiki/extensions/GrowthExperiments@wmf/1.37.0-wmf.18] Add Link: fix invalidation on non-addlink edit

https://gerrit.wikimedia.org/r/711719

Change 711719 merged by jenkins-bot:

[mediawiki/extensions/GrowthExperiments@wmf/1.37.0-wmf.18] Add Link: fix invalidation on non-addlink edit

https://gerrit.wikimedia.org/r/711719

Mentioned in SAL (#wikimedia-operations) [2021-08-12T23:38:55Z] <cjming@deploy1002> Synchronized php-1.37.0-wmf.18/extensions/GrowthExperiments: Backport: [[gerrit:711719|Add Link: fix invalidation on non-addlink edit (T283606)]] (duration: 01m 00s)

Current link recommendation task counts:

arwiki: 21861
bnwiki: 11781
cswiki: 17906
eswiki: 22477
fawiki: 17403
frwiki: 23541
huwiki: 27998
plwiki: 20320
rowiki: 18110
ruwiki: 20938
testwiki: 671
viwiki: 17594

Those should be small enough for fixLinkRecommendationData.php even in its current limited-to-10K form, just need to reverse the sorting.

Task counts after fixing:

rwiki: 16512
bnwiki: 7996
cswiki: 11368
eswiki: 18333
fawiki: 14012
frwiki: 19370
huwiki: 25626
plwiki: 16832
rowiki: 17182
ruwiki: 17525
testwiki: 622
viwiki: 11803

Two spot checks:

tgr@mwmaint2002:~$ mwscript extensions/GrowthExperiments/maintenance/fixLinkRecommendationData.php --wiki=cswiki --search-index --dry-run --verbose | wc -l
1387
tgr@mwmaint2002:~$ mwscript extensions/GrowthExperiments/maintenance/fixLinkRecommendationData.php --wiki=huwiki --search-index --dry-run --verbose | wc -l
124

so for some reason, not 100% effective, but number of dangling index entries went down quite a bit.

[...]
so for some reason, not 100% effective, but number of dangling index entries went down quite a bit.

I ran the script again for cswiki (a couple of times), and the number of unfixed suggestions dropped to zero (at least according to the script).

Note your wc -l command is not fully correct: it counts lines saying "checking 100 titles...", too. Something like mwscript extensions/GrowthExperiments/maintenance/fixLinkRecommendationData.php --wiki=cswiki --search-index --dry-run --verbose | grep 'Would fix' | wc -l would be more precise (which is how I got to the zero).

Yeah, the result is off by 100 if you don't filter out the "checking..." rows, but 1300 is still a lot. I thought I run the script enough times to cover all items. Job queue delay for the cirrus updates, maybe?

Checked on production (wmf.18) the following wikis (I selected wikis based on Special:Homepage/Suggested Edits stats (10 articles with different topic selection).

wikiresults for 10 articlesnotes
cswikiOK
frwikiOK
arwikiOK
fawikiOKThe Console error was displayed, no user impact:Uncaught TypeError: Cannot read property 'token' of undefined at NewcomerTaskLogger.log
plwikiOK
ruwikinine articles were OKone articles displayed "Back to suggestions"

Yeah, the result is off by 100 if you don't filter out the "checking..." rows, but 1300 is still a lot. I thought I run the script enough times to cover all items. Job queue delay for the cirrus updates, maybe?

Agreed. I don't think it's caused by the job queue -- the numbers went down immediately after my reruns, and the runs always said "Fixing <article title>", with different title each time.

Mentioned in SAL (#wikimedia-operations) [2021-11-10T04:15:48Z] <tgr> T283606: running foreachwikiindblist growthexperiments extensions/GrowthExperiments/maintenance/fixLinkRecommendationData.php --search-index