In ambassador testing in production and on Test Wikipedia, we've noticed that a high percentage of suggested articles end up having no suggestions available once the user arrives on them. It sounds like this might be 10-20%, which is much too high. We would want this experience to happen 1% of the time or less.
Description
Details
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | MMiller_WMF | T252822 [EPIC] Growth: "add a link" structured task 1.0 | |||
Resolved | Tgr | T283606 Add a link: too many articles have no suggestions upon arrival |
Event Timeline
Change 695191 had a related patch set uploaded (by Gergő Tisza; author: Gergő Tisza):
[mediawiki/extensions/GrowthExperiments@master] Allow running fixLinkRecommendationData --search-index in production
Change 695193 had a related patch set uploaded (by Gergő Tisza; author: Gergő Tisza):
[mediawiki/extensions/GrowthExperiments@master] Add --dry-run option to fixLinkRecommendationData.php
Change 695231 had a related patch set uploaded (by Gergő Tisza; author: Gergő Tisza):
[mediawiki/extensions/GrowthExperiments@master] Always delete from search index in AddLinkSubmissionHandler
Change 695044 had a related patch set uploaded (by Kosta Harlan; author: Gergő Tisza):
[mediawiki/extensions/GrowthExperiments@wmf/1.37.0-wmf.6] Allow running fixLinkRecommendationData --search-index in production
Change 695045 had a related patch set uploaded (by Kosta Harlan; author: Gergő Tisza):
[mediawiki/extensions/GrowthExperiments@wmf/1.37.0-wmf.7] Allow running fixLinkRecommendationData --search-index in production
Change 695191 merged by jenkins-bot:
[mediawiki/extensions/GrowthExperiments@master] Allow running fixLinkRecommendationData --search-index in production
Change 695193 merged by jenkins-bot:
[mediawiki/extensions/GrowthExperiments@master] Add --dry-run option to fixLinkRecommendationData.php
Change 695231 merged by jenkins-bot:
[mediawiki/extensions/GrowthExperiments@master] Always delete from search index in AddLinkSubmissionHandler
Change 695045 merged by jenkins-bot:
[mediawiki/extensions/GrowthExperiments@wmf/1.37.0-wmf.7] Allow running fixLinkRecommendationData --search-index in production
Mentioned in SAL (#wikimedia-operations) [2021-05-26T11:39:48Z] <urbanecm@deploy1002> Synchronized php-1.37.0-wmf.7/extensions/GrowthExperiments/maintenance/fixLinkRecommendationData.php: 86bba48: Allow running fixLinkRecommendationData --search-index in production (T283606) (duration: 01m 06s)
Change 695044 merged by jenkins-bot:
[mediawiki/extensions/GrowthExperiments@wmf/1.37.0-wmf.6] Allow running fixLinkRecommendationData --search-index in production
Mentioned in SAL (#wikimedia-operations) [2021-05-26T11:44:04Z] <urbanecm@deploy1002> Synchronized php-1.37.0-wmf.6/extensions/GrowthExperiments/maintenance/fixLinkRecommendationData.php: b3c2941: Allow running fixLinkRecommendationData --search-index in production (T283606) (duration: 01m 07s)
Change 695292 had a related patch set uploaded (by Gergő Tisza; author: Gergő Tisza):
[mediawiki/extensions/GrowthExperiments@master] fixLinkRecommendationData.php: also fix search index for old DB entries
I think this is done (thanks for figuring out the root of the problem @kostajh!), other than running the fixer script for the production wikis. I wonder if we'd want to track no suggestions rate in an easy-to-monitor way though? We do record an EventGate event, not sure if there's an easy way to chart those though.
Current search index sizes: ar 16K, bn 9K, cs 20K, vi 14K, testwiki 750.
Current DB sizes: ar 23K, bn 16K, cs 20K, vi 21K, testwiki 500.
Probably most of the difference is due to T261407#7088905, not this issue, though.
Change 695478 had a related patch set uploaded (by MewOphaswongse; author: Gergő Tisza):
[mediawiki/extensions/GrowthExperiments@wmf/1.37.0-wmf.6] Always delete from search index in AddLinkSubmissionHandler
Change 695479 had a related patch set uploaded (by MewOphaswongse; author: Gergő Tisza):
[mediawiki/extensions/GrowthExperiments@wmf/1.37.0-wmf.7] Always delete from search index in AddLinkSubmissionHandler
Mentioned in SAL (#wikimedia-operations) [2021-05-26T21:22:21Z] <tgr> T283606: running mwscript extensions/GrowthExperiments/maintenance/fixLinkRecommendationData.php --wiki={ar,bn,cs,vi}wiki --verbose --search-index
Well, this didn't really work.
$ for wiki in arwiki bnwiki cswiki viwiki; do echo "== $wiki"; mwscript extensions/GrowthExperiments/maintenance/fixLinkRecommendationData.php --wiki=$wiki --verbose --search-index | grep 'Fixing'; done == arwiki Fixing بيني دريدفول (مسلسل) == bnwiki Fixing ক্যাপ্টেন মার্ভেল (চলচ্চিত্র) Fixing প্রাচ্যতত্ত্ব Fixing এরোমাঙ্গা সেনসেই == cswiki Fixing Slabá interakce Fixing Souhvězdí Malého psa Fixing Phönix C.I Fixing Convair XP-81 Fixing Accelerated Processing Unit Fixing Dolarová aukce == viwiki Fixing Công nghệ môi trường Fixing Đoạn thẳng Fixing Sân bay quốc tế Cape Town Fixing Windows Preinstallation Environment Fixing Samba (phần mềm) Fixing Phân tích quang phổ Fixing Chi Lan hoa sâm Fixing Republic F-84F Thunderstreak Fixing Ung thư lưỡi Fixing Đại lượng vô hướng Fixing Bất đẳng thức Ky Fan Fixing Subaru
We'll need https://gerrit.wikimedia.org/r/c/mediawiki/extensions/GrowthExperiments/+/695292 I guess.
Also for two of the four wikis the script aborted with an error about how iterating through search results is capped at 10K results. That's a tougher problem. I guess we can split by topics / groups of topics?
Change 695292 merged by jenkins-bot:
[mediawiki/extensions/GrowthExperiments@master] fixLinkRecommendationData.php: also fix search index for old DB entries
Change 695837 had a related patch set uploaded (by Gergő Tisza; author: Gergő Tisza):
[mediawiki/extensions/GrowthExperiments@wmf/1.37.0-wmf.7] fixLinkRecommendationData.php: also fix search index for old DB entries
Change 695838 had a related patch set uploaded (by Gergő Tisza; author: Gergő Tisza):
[mediawiki/extensions/GrowthExperiments@wmf/1.37.0-wmf.6] fixLinkRecommendationData.php: also fix search index for old DB entries
Change 696307 had a related patch set uploaded (by Gergő Tisza; author: Gergő Tisza):
[mediawiki/extensions/GrowthExperiments@master] Fix Ie9a1018c198 for external cluster
Change 695478 merged by jenkins-bot:
[mediawiki/extensions/GrowthExperiments@wmf/1.37.0-wmf.6] Always delete from search index in AddLinkSubmissionHandler
Change 696307 merged by jenkins-bot:
[mediawiki/extensions/GrowthExperiments@master] Fix Ie9a1018c198 for external cluster
Change 695843 had a related patch set uploaded (by Gergő Tisza; author: Gergő Tisza):
[mediawiki/extensions/GrowthExperiments@wmf/1.37.0-wmf.6] Fix Ie9a1018c198 for external cluster
Change 695844 had a related patch set uploaded (by Gergő Tisza; author: Gergő Tisza):
[mediawiki/extensions/GrowthExperiments@wmf/1.37.0-wmf.7] Fix Ie9a1018c198 for external cluster
Change 695479 merged by jenkins-bot:
[mediawiki/extensions/GrowthExperiments@wmf/1.37.0-wmf.7] Always delete from search index in AddLinkSubmissionHandler
Mentioned in SAL (#wikimedia-operations) [2021-05-27T12:40:41Z] <tgr@deploy1002> Synchronized php-1.37.0-wmf.7/extensions/GrowthExperiments/: Backport: [[gerrit:695437|Add Link: Prevent double-opening of the post-edit dialog (T283120)]] [[gerrit:695479|Always delete from search index in AddLinkSubmissionHandler (T283606)]] (duration: 01m 06s)
Mentioned in SAL (#wikimedia-operations) [2021-05-27T12:55:43Z] <tgr> T283606: running mwscript extensions/GrowthExperiments/maintenance/fixLinkRecommendationData.php --wiki={ar,bn,cs,vi}wiki --verbose --search-index
Mentioned in SAL (#wikimedia-operations) [2021-05-27T12:57:22Z] <tgr> T283606: running mwscript extensions/GrowthExperiments/maintenance/fixLinkRecommendationData.php --wiki={ar,bn,cs,vi}wiki --verbose --search-index with gerrit:696307 applied
Fixed items: ar 1416, bn 1128, cs 2291, vi 1396. This is out of the first 10K items - except for bn, all have significantly more than that, so this doesn't fully solve the problem. We'll have to break the search up into multiple queries somehow.
Change 697580 had a related patch set uploaded (by Gergő Tisza; author: Gergő Tisza):
[mediawiki/extensions/GrowthExperiments@master] Add Link: Fix refreshLinkRecommendations.php counting logic
Change 697580 merged by jenkins-bot:
[mediawiki/extensions/GrowthExperiments@master] Add Link: Fix refreshLinkRecommendations.php counting logic
Change 695844 abandoned by Gergő Tisza:
[mediawiki/extensions/GrowthExperiments@wmf/1.37.0-wmf.7] Fix Ie9a1018c198 for external cluster
Reason:
relatively unimportant, we ended up not backporting it
Change 695843 abandoned by Gergő Tisza:
[mediawiki/extensions/GrowthExperiments@wmf/1.37.0-wmf.6] Fix Ie9a1018c198 for external cluster
Reason:
relatively unimportant, we ended up not backporting it
Change 695838 abandoned by Gergő Tisza:
[mediawiki/extensions/GrowthExperiments@wmf/1.37.0-wmf.6] fixLinkRecommendationData.php: also fix search index for old DB entries
Reason:
relatively unimportant, we ended up not backporting it
Change 695837 abandoned by Gergő Tisza:
[mediawiki/extensions/GrowthExperiments@wmf/1.37.0-wmf.7] fixLinkRecommendationData.php: also fix search index for old DB entries
Reason:
relatively unimportant, we ended up not backporting it
Tasks with missing index entries from the first 10K tasks:
- arwiki: 356
- bnwiki: 601
- cswiki: 882
- viwiki: 468
That's still way too high.
Change 697879 had a related patch set uploaded (by Gergő Tisza; author: Gergő Tisza):
[mediawiki/extensions/GrowthExperiments@master] Fix mw.errorLogger.logError calls
@Tgr shall we try to make this task a bit more specific, either about identifying the root cause(s) or documenting specific fixes we want to implement? In either case, this seems like more of an epic or parent task.
Change 697879 merged by jenkins-bot:
[mediawiki/extensions/GrowthExperiments@master] Fix mw.errorLogger.logError calls
$ for wiki in arwiki bnwiki cswiki viwiki; do echo -n "$wiki: "; mwscript `pwd`/fixLinkRecommendationData.php --wiki=$wiki --verbose --dry-run --search-index 2>/dev/null | ack 'Would fix' | wc -l; done arwiki: 622 bnwiki: 783 cswiki: 1963 viwiki: 710
This was 5 days after T283606#7129169, which is six days after the last manual cleanup.
There are three avenues to pursue:
- identify and fix the root cause(s) - obviously this would be the ideal solution
- run the fixer script periodically as a cron job
- do a DB query to filter out out missing entries in TaskSuggester::filter
There's also T283814: CirrusSearch::resetWeightedTags() is slow for speeding up the fixer script, plus the issue of search results being capped at 10K (filed as T284531).
Mentioned in SAL (#wikimedia-operations) [2021-06-08T06:52:41Z] <tgr> T283606: running mwscript extensions/GrowthExperiments/maintenance/fixLinkRecommendationData.php --wiki={ar,bn,cs,vi}wiki --verbose --search-index with gerrit:696307 applied
@Tgr in T283109: Add link: Edits on Add link articles are automatically removed by subsequent Add link edits I was looking more closely at onSearchDataForIndex method in HomepageHooks.php. If a user makes a regular visualeditor/wikitext edit to an article that has link suggestions, the onSearchDataForIndex calls $linkRecommendation = $this->linkRecommendationStore->getByLinkTarget( $page->getTitle() );. getByLinkTarget() uses the latest revision ID (first it tries the replica, then it tries the source DB) associated with the title. Since this hook is triggered in job execution, either with replica or source DB, it's possible that this revision ID is newer than what is stored in the database for the link suggestion. So, $linkRecommendation is null and a delete never happens. Maybe we should be obtaining the link recommendation by page ID, then checking if its revision ID is lower than what is passed to us in onSearchDataForIndex?
Good catch, thanks! Yeah, we want the latest recommendation, not necessarily the current one. LinkRecommendation::getByLinkTarget() is not an examplar of great signature design :(
Change 702482 had a related patch set uploaded (by Gergő Tisza; author: Gergő Tisza):
[mediawiki/extensions/GrowthExperiments@master] Add Link: fix invalidation on non-addlink edit
Change 702482 merged by jenkins-bot:
[mediawiki/extensions/GrowthExperiments@master] Add Link: fix invalidation on non-addlink edit
Change 711719 had a related patch set uploaded (by Gergő Tisza; author: Gergő Tisza):
[mediawiki/extensions/GrowthExperiments@wmf/1.37.0-wmf.18] Add Link: fix invalidation on non-addlink edit
Change 711719 merged by jenkins-bot:
[mediawiki/extensions/GrowthExperiments@wmf/1.37.0-wmf.18] Add Link: fix invalidation on non-addlink edit
Mentioned in SAL (#wikimedia-operations) [2021-08-12T23:38:55Z] <cjming@deploy1002> Synchronized php-1.37.0-wmf.18/extensions/GrowthExperiments: Backport: [[gerrit:711719|Add Link: fix invalidation on non-addlink edit (T283606)]] (duration: 01m 00s)
Current link recommendation task counts:
arwiki: 21861 bnwiki: 11781 cswiki: 17906 eswiki: 22477 fawiki: 17403 frwiki: 23541 huwiki: 27998 plwiki: 20320 rowiki: 18110 ruwiki: 20938 testwiki: 671 viwiki: 17594
Those should be small enough for fixLinkRecommendationData.php even in its current limited-to-10K form, just need to reverse the sorting.
Task counts after fixing:
rwiki: 16512 bnwiki: 7996 cswiki: 11368 eswiki: 18333 fawiki: 14012 frwiki: 19370 huwiki: 25626 plwiki: 16832 rowiki: 17182 ruwiki: 17525 testwiki: 622 viwiki: 11803
Two spot checks:
tgr@mwmaint2002:~$ mwscript extensions/GrowthExperiments/maintenance/fixLinkRecommendationData.php --wiki=cswiki --search-index --dry-run --verbose | wc -l 1387 tgr@mwmaint2002:~$ mwscript extensions/GrowthExperiments/maintenance/fixLinkRecommendationData.php --wiki=huwiki --search-index --dry-run --verbose | wc -l 124
so for some reason, not 100% effective, but number of dangling index entries went down quite a bit.
I ran the script again for cswiki (a couple of times), and the number of unfixed suggestions dropped to zero (at least according to the script).
Note your wc -l command is not fully correct: it counts lines saying "checking 100 titles...", too. Something like mwscript extensions/GrowthExperiments/maintenance/fixLinkRecommendationData.php --wiki=cswiki --search-index --dry-run --verbose | grep 'Would fix' | wc -l would be more precise (which is how I got to the zero).
Yeah, the result is off by 100 if you don't filter out the "checking..." rows, but 1300 is still a lot. I thought I run the script enough times to cover all items. Job queue delay for the cirrus updates, maybe?
Checked on production (wmf.18) the following wikis (I selected wikis based on Special:Homepage/Suggested Edits stats (10 articles with different topic selection).
wiki | results for 10 articles | notes |
---|---|---|
cswiki | OK | |
frwiki | OK | |
arwiki | OK | |
fawiki | OK | The Console error was displayed, no user impact:Uncaught TypeError: Cannot read property 'token' of undefined at NewcomerTaskLogger.log |
plwiki | OK | |
ruwiki | nine articles were OK | one articles displayed "Back to suggestions" |
Agreed. I don't think it's caused by the job queue -- the numbers went down immediately after my reruns, and the runs always said "Fixing <article title>", with different title each time.
Mentioned in SAL (#wikimedia-operations) [2021-11-10T04:15:48Z] <tgr> T283606: running foreachwikiindblist growthexperiments extensions/GrowthExperiments/maintenance/fixLinkRecommendationData.php --search-index