Page MenuHomePhabricator

Add a link: too many articles have no suggestions upon arrival
Open, HighPublic

Description

In ambassador testing in production and on Test Wikipedia, we've noticed that a high percentage of suggested articles end up having no suggestions available once the user arrives on them. It sounds like this might be 10-20%, which is much too high. We would want this experience to happen 1% of the time or less.

Details

ProjectBranchLines +/-Subject
mediawiki/extensions/GrowthExperimentsmaster+13 -15
mediawiki/extensions/GrowthExperimentswmf/1.37.0-wmf.7+6 -3
mediawiki/extensions/GrowthExperimentswmf/1.37.0-wmf.6+6 -3
mediawiki/extensions/GrowthExperimentswmf/1.37.0-wmf.6+16 -5
mediawiki/extensions/GrowthExperimentswmf/1.37.0-wmf.7+16 -5
mediawiki/extensions/GrowthExperimentsmaster+14 -9
mediawiki/extensions/GrowthExperimentsmaster+16 -5
mediawiki/extensions/GrowthExperimentswmf/1.37.0-wmf.7+6 -2
mediawiki/extensions/GrowthExperimentswmf/1.37.0-wmf.6+6 -2
mediawiki/extensions/GrowthExperimentsmaster+6 -3
mediawiki/extensions/GrowthExperimentsmaster+6 -2
mediawiki/extensions/GrowthExperimentsmaster+4 -2
mediawiki/extensions/GrowthExperimentswmf/1.37.0-wmf.6+4 -2
mediawiki/extensions/GrowthExperimentsmaster+10 -5
mediawiki/extensions/GrowthExperimentswmf/1.37.0-wmf.7+4 -2
Show related patches Customize query in gerrit

Event Timeline

Change 695191 had a related patch set uploaded (by Gergő Tisza; author: Gergő Tisza):

[mediawiki/extensions/GrowthExperiments@master] Allow running fixLinkRecommendationData --search-index in production

https://gerrit.wikimedia.org/r/695191

Change 695193 had a related patch set uploaded (by Gergő Tisza; author: Gergő Tisza):

[mediawiki/extensions/GrowthExperiments@master] Add --dry-run option to fixLinkRecommendationData.php

https://gerrit.wikimedia.org/r/695193

Change 695231 had a related patch set uploaded (by Gergő Tisza; author: Gergő Tisza):

[mediawiki/extensions/GrowthExperiments@master] Always delete from search index in AddLinkSubmissionHandler

https://gerrit.wikimedia.org/r/695231

Change 695044 had a related patch set uploaded (by Kosta Harlan; author: Gergő Tisza):

[mediawiki/extensions/GrowthExperiments@wmf/1.37.0-wmf.6] Allow running fixLinkRecommendationData --search-index in production

https://gerrit.wikimedia.org/r/695044

Change 695045 had a related patch set uploaded (by Kosta Harlan; author: Gergő Tisza):

[mediawiki/extensions/GrowthExperiments@wmf/1.37.0-wmf.7] Allow running fixLinkRecommendationData --search-index in production

https://gerrit.wikimedia.org/r/695045

Change 695191 merged by jenkins-bot:

[mediawiki/extensions/GrowthExperiments@master] Allow running fixLinkRecommendationData --search-index in production

https://gerrit.wikimedia.org/r/695191

Change 695193 merged by jenkins-bot:

[mediawiki/extensions/GrowthExperiments@master] Add --dry-run option to fixLinkRecommendationData.php

https://gerrit.wikimedia.org/r/695193

Change 695231 merged by jenkins-bot:

[mediawiki/extensions/GrowthExperiments@master] Always delete from search index in AddLinkSubmissionHandler

https://gerrit.wikimedia.org/r/695231

Change 695045 merged by jenkins-bot:

[mediawiki/extensions/GrowthExperiments@wmf/1.37.0-wmf.7] Allow running fixLinkRecommendationData --search-index in production

https://gerrit.wikimedia.org/r/695045

Mentioned in SAL (#wikimedia-operations) [2021-05-26T11:39:48Z] <urbanecm@deploy1002> Synchronized php-1.37.0-wmf.7/extensions/GrowthExperiments/maintenance/fixLinkRecommendationData.php: 86bba48: Allow running fixLinkRecommendationData --search-index in production (T283606) (duration: 01m 06s)

Change 695044 merged by jenkins-bot:

[mediawiki/extensions/GrowthExperiments@wmf/1.37.0-wmf.6] Allow running fixLinkRecommendationData --search-index in production

https://gerrit.wikimedia.org/r/695044

Mentioned in SAL (#wikimedia-operations) [2021-05-26T11:44:04Z] <urbanecm@deploy1002> Synchronized php-1.37.0-wmf.6/extensions/GrowthExperiments/maintenance/fixLinkRecommendationData.php: b3c2941: Allow running fixLinkRecommendationData --search-index in production (T283606) (duration: 01m 07s)

Change 695292 had a related patch set uploaded (by Gergő Tisza; author: Gergő Tisza):

[mediawiki/extensions/GrowthExperiments@master] fixLinkRecommendationData.php: also fix search index for old DB entries

https://gerrit.wikimedia.org/r/695292

I think this is done (thanks for figuring out the root of the problem @kostajh!), other than running the fixer script for the production wikis. I wonder if we'd want to track no suggestions rate in an easy-to-monitor way though? We do record an EventGate event, not sure if there's an easy way to chart those though.

Current search index sizes: ar 16K, bn 9K, cs 20K, vi 14K, testwiki 750.
Current DB sizes: ar 23K, bn 16K, cs 20K, vi 21K, testwiki 500.

Probably most of the difference is due to T261407#7088905, not this issue, though.

Change 695478 had a related patch set uploaded (by MewOphaswongse; author: Gergő Tisza):

[mediawiki/extensions/GrowthExperiments@wmf/1.37.0-wmf.6] Always delete from search index in AddLinkSubmissionHandler

https://gerrit.wikimedia.org/r/695478

Change 695479 had a related patch set uploaded (by MewOphaswongse; author: Gergő Tisza):

[mediawiki/extensions/GrowthExperiments@wmf/1.37.0-wmf.7] Always delete from search index in AddLinkSubmissionHandler

https://gerrit.wikimedia.org/r/695479

Mentioned in SAL (#wikimedia-operations) [2021-05-26T21:22:21Z] <tgr> T283606: running mwscript extensions/GrowthExperiments/maintenance/fixLinkRecommendationData.php --wiki={ar,bn,cs,vi}wiki --verbose --search-index

Well, this didn't really work.

$ for wiki in arwiki bnwiki cswiki viwiki; do echo "== $wiki"; mwscript extensions/GrowthExperiments/maintenance/fixLinkRecommendationData.php --wiki=$wiki --verbose --search-index | grep 'Fixing'; done
== arwiki
    Fixing بيني دريدفول (مسلسل)
== bnwiki
    Fixing ক্যাপ্টেন মার্ভেল (চলচ্চিত্র)
    Fixing প্রাচ্যতত্ত্ব
    Fixing এরোমাঙ্গা সেনসেই
== cswiki
    Fixing Slabá interakce
    Fixing Souhvězdí Malého psa
    Fixing Phönix C.I
    Fixing Convair XP-81
    Fixing Accelerated Processing Unit
    Fixing Dolarová aukce
== viwiki
    Fixing Công nghệ môi trường
    Fixing Đoạn thẳng
    Fixing Sân bay quốc tế Cape Town
    Fixing Windows Preinstallation Environment
    Fixing Samba (phần mềm)
    Fixing Phân tích quang phổ
    Fixing Chi Lan hoa sâm
    Fixing Republic F-84F Thunderstreak
    Fixing Ung thư lưỡi
    Fixing Đại lượng vô hướng
    Fixing Bất đẳng thức Ky Fan
    Fixing Subaru

We'll need https://gerrit.wikimedia.org/r/c/mediawiki/extensions/GrowthExperiments/+/695292 I guess.

Also for two of the four wikis the script aborted with an error about how iterating through search results is capped at 10K results. That's a tougher problem. I guess we can split by topics / groups of topics?

Change 695292 merged by jenkins-bot:

[mediawiki/extensions/GrowthExperiments@master] fixLinkRecommendationData.php: also fix search index for old DB entries

https://gerrit.wikimedia.org/r/695292

Change 695837 had a related patch set uploaded (by Gergő Tisza; author: Gergő Tisza):

[mediawiki/extensions/GrowthExperiments@wmf/1.37.0-wmf.7] fixLinkRecommendationData.php: also fix search index for old DB entries

https://gerrit.wikimedia.org/r/695837

Change 695838 had a related patch set uploaded (by Gergő Tisza; author: Gergő Tisza):

[mediawiki/extensions/GrowthExperiments@wmf/1.37.0-wmf.6] fixLinkRecommendationData.php: also fix search index for old DB entries

https://gerrit.wikimedia.org/r/695838

Change 696307 had a related patch set uploaded (by Gergő Tisza; author: Gergő Tisza):

[mediawiki/extensions/GrowthExperiments@master] Fix Ie9a1018c198 for external cluster

https://gerrit.wikimedia.org/r/696307

Change 695478 merged by jenkins-bot:

[mediawiki/extensions/GrowthExperiments@wmf/1.37.0-wmf.6] Always delete from search index in AddLinkSubmissionHandler

https://gerrit.wikimedia.org/r/695478

Change 696307 merged by jenkins-bot:

[mediawiki/extensions/GrowthExperiments@master] Fix Ie9a1018c198 for external cluster

https://gerrit.wikimedia.org/r/696307

Change 695843 had a related patch set uploaded (by Gergő Tisza; author: Gergő Tisza):

[mediawiki/extensions/GrowthExperiments@wmf/1.37.0-wmf.6] Fix Ie9a1018c198 for external cluster

https://gerrit.wikimedia.org/r/695843

Change 695844 had a related patch set uploaded (by Gergő Tisza; author: Gergő Tisza):

[mediawiki/extensions/GrowthExperiments@wmf/1.37.0-wmf.7] Fix Ie9a1018c198 for external cluster

https://gerrit.wikimedia.org/r/695844

Change 695479 merged by jenkins-bot:

[mediawiki/extensions/GrowthExperiments@wmf/1.37.0-wmf.7] Always delete from search index in AddLinkSubmissionHandler

https://gerrit.wikimedia.org/r/695479

Mentioned in SAL (#wikimedia-operations) [2021-05-27T12:40:41Z] <tgr@deploy1002> Synchronized php-1.37.0-wmf.7/extensions/GrowthExperiments/: Backport: [[gerrit:695437|Add Link: Prevent double-opening of the post-edit dialog (T283120)]] [[gerrit:695479|Always delete from search index in AddLinkSubmissionHandler (T283606)]] (duration: 01m 06s)

Mentioned in SAL (#wikimedia-operations) [2021-05-27T12:55:43Z] <tgr> T283606: running mwscript extensions/GrowthExperiments/maintenance/fixLinkRecommendationData.php --wiki={ar,bn,cs,vi}wiki --verbose --search-index

Mentioned in SAL (#wikimedia-operations) [2021-05-27T12:57:22Z] <tgr> T283606: running mwscript extensions/GrowthExperiments/maintenance/fixLinkRecommendationData.php --wiki={ar,bn,cs,vi}wiki --verbose --search-index with gerrit:696307 applied

Fixed items: ar 1416, bn 1128, cs 2291, vi 1396. This is out of the first 10K items - except for bn, all have significantly more than that, so this doesn't fully solve the problem. We'll have to break the search up into multiple queries somehow.

Change 697580 had a related patch set uploaded (by Gergő Tisza; author: Gergő Tisza):

[mediawiki/extensions/GrowthExperiments@master] Add Link: Fix refreshLinkRecommendations.php counting logic

https://gerrit.wikimedia.org/r/697580

Change 697580 merged by jenkins-bot:

[mediawiki/extensions/GrowthExperiments@master] Add Link: Fix refreshLinkRecommendations.php counting logic

https://gerrit.wikimedia.org/r/697580

Change 695844 abandoned by Gergő Tisza:

[mediawiki/extensions/GrowthExperiments@wmf/1.37.0-wmf.7] Fix Ie9a1018c198 for external cluster

Reason:

relatively unimportant, we ended up not backporting it

https://gerrit.wikimedia.org/r/695844

Change 695843 abandoned by Gergő Tisza:

[mediawiki/extensions/GrowthExperiments@wmf/1.37.0-wmf.6] Fix Ie9a1018c198 for external cluster

Reason:

relatively unimportant, we ended up not backporting it

https://gerrit.wikimedia.org/r/695843

Change 695838 abandoned by Gergő Tisza:

[mediawiki/extensions/GrowthExperiments@wmf/1.37.0-wmf.6] fixLinkRecommendationData.php: also fix search index for old DB entries

Reason:

relatively unimportant, we ended up not backporting it

https://gerrit.wikimedia.org/r/695838

Change 695837 abandoned by Gergő Tisza:

[mediawiki/extensions/GrowthExperiments@wmf/1.37.0-wmf.7] fixLinkRecommendationData.php: also fix search index for old DB entries

Reason:

relatively unimportant, we ended up not backporting it

https://gerrit.wikimedia.org/r/695837

Tasks with missing index entries from the first 10K tasks:

  • arwiki: 356
  • bnwiki: 601
  • cswiki: 882
  • viwiki: 468

That's still way too high.

Change 697879 had a related patch set uploaded (by Gergő Tisza; author: Gergő Tisza):

[mediawiki/extensions/GrowthExperiments@master] Fix mw.errorLogger.logError calls

https://gerrit.wikimedia.org/r/697879

@Tgr shall we try to make this task a bit more specific, either about identifying the root cause(s) or documenting specific fixes we want to implement? In either case, this seems like more of an epic or parent task.

Change 697879 merged by jenkins-bot:

[mediawiki/extensions/GrowthExperiments@master] Fix mw.errorLogger.logError calls

https://gerrit.wikimedia.org/r/697879

$ for wiki in arwiki bnwiki cswiki viwiki; do echo -n "$wiki: "; mwscript `pwd`/fixLinkRecommendationData.php --wiki=$wiki --verbose --dry-run --search-index 2>/dev/null | ack 'Would fix' | wc -l; done
arwiki: 622
bnwiki: 783
cswiki: 1963
viwiki: 710

This was 5 days after T283606#7129169, which is six days after the last manual cleanup.

@Tgr shall we try to make this task a bit more specific, either about identifying the root cause(s) or documenting specific fixes we want to implement? In either case, this seems like more of an epic or parent task.

There are three avenues to pursue:

  • identify and fix the root cause(s) - obviously this would be the ideal solution
  • run the fixer script periodically as a cron job
  • do a DB query to filter out out missing entries in TaskSuggester::filter

There's also T283814: CirrusSearch::resetWeightedTags() is slow for speeding up the fixer script, plus the issue of search results being capped at 10K (filed as T284531).

Mentioned in SAL (#wikimedia-operations) [2021-06-08T06:52:41Z] <tgr> T283606: running mwscript extensions/GrowthExperiments/maintenance/fixLinkRecommendationData.php --wiki={ar,bn,cs,vi}wiki --verbose --search-index with gerrit:696307 applied

@Tgr in T283109: Add link: Edits on Add link articles are automatically removed by subsequent Add link edits I was looking more closely at onSearchDataForIndex method in HomepageHooks.php. If a user makes a regular visualeditor/wikitext edit to an article that has link suggestions, the onSearchDataForIndex calls $linkRecommendation = $this->linkRecommendationStore->getByLinkTarget( $page->getTitle() );. getByLinkTarget() uses the latest revision ID (first it tries the replica, then it tries the source DB) associated with the title. Since this hook is triggered in job execution, either with replica or source DB, it's possible that this revision ID is newer than what is stored in the database for the link suggestion. So, $linkRecommendation is null and a delete never happens. Maybe we should be obtaining the link recommendation by page ID, then checking if its revision ID is lower than what is passed to us in onSearchDataForIndex?