Page MenuHomePhabricator

High number of dangling search index results at fr.wikipedia or it.wikipedia
Closed, ResolvedPublic

Description

While investigating T372333: de.wikipedia: Add Link unavailable due to a high-number of dangling records, I noticed a high number of dangling search index results (for Add-Link-Structured-Task) at fr.wikipedia:

image.png (712×1 px, 84 KB)

For ~140k of recommendations in total, this means ~14% of the task pool is invalid, which is very high. Similarly high numbers are reported at it.wikipedia:

image.png (713×1 px, 85 KB)

So far, it looks like Add Link works on those wikis normally. But, those high numbers are definitely not normal, and based on the charts above, it all started sometime in May. There might be a greater number of newcomers getting the "no recommendation available" error screen (which happens when the two stores don't agree with each other).

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Change #1078653 had a related patch set uploaded (by Michael Große; author: Michael Große):

[mediawiki/extensions/GrowthExperiments@master] Clear LinkRecommendation suggestions on page save

https://gerrit.wikimedia.org/r/1078653

Michael moved this task from Inbox to Current Sprint on the Growth-Team board.
Michael edited projects, added Growth-Team (Current Sprint); removed Growth-Team.
Michael moved this task from Incoming to Code Review on the Growth-Team (Current Sprint) board.

Change #1078653 merged by jenkins-bot:

[mediawiki/extensions/GrowthExperiments@master] Clear LinkRecommendation suggestions on page save

https://gerrit.wikimedia.org/r/1078653

The changes have been merged. I wonder if it would make sense to backport them on Monday (together with the fix to the maint script from T373176), and then to enable it on a single wiki (which?) and see how it goes? Otherwise, the earliest we could do this would be after the train reaches Group 2 on Thursday, and that feels like loosing yet another week for making progress with this.

The changes have been merged. I wonder if it would make sense to backport them on Monday (together with the fix to the maint script from T373176), and then to enable it on a single wiki (which?) and see how it goes? Otherwise, the earliest we could do this would be after the train reaches Group 2 on Thursday, and that feels like loosing yet another week for making progress with this.

Backporting seems like a good idea to me. As to which wiki to experiment with, I'd go with frwiki and/or eswiki, because of two reasons:

  • both have an excessive number of dangling results,
  • both accumulated substantive number of dangling results in the last week (both went by ~1k up),
  • we can communicate with both in their language (Benoit / Isa) should something go terribly wrong

I just noticed one thing. The dangling records have grown suspiciously fast. Most of them were accumulated on September 25. The theory fixed in https://gerrit.wikimedia.org/r/c/mediawiki/extensions/GrowthExperiments/+/1078653 indicates dangling records would be accumulated approximately as quickly as add link edits are saved, but that doesn't seem to be the case, see dangling records chart for frwiki and last month:

image.png (798×1 px, 64 KB)

For comparsion, here is the edit volume (Add Link edits only; blue is frwiki, yellow eswiki):

image.png (770×1 px, 70 KB)

But, that doesn't mean we can't try, it might explain at least some growth of the dangling records.

Change #1079915 had a related patch set uploaded (by Michael Große; author: Michael Große):

[mediawiki/extensions/GrowthExperiments@wmf/1.43.0-wmf.26] Clear LinkRecommendation suggestions on page save

https://gerrit.wikimedia.org/r/1079915

I noticed that spike earlier as well. Still not sure how this could have happened.

One cause could be that suddenly large number of db-entries got removed? That doesn't sound likely to me.

On the other hand, looking at the script, there is possibly some double counting of dangling search records going on, because pages can have multiple topics and so pages with a dangling recommendation and multiple topics might be counted multiple times. So maybe, for some reason, many pages got their topics reassigned? That sounds more plausible to me. Maybe the search team rolled out an update to their article-topic-model or something?

Though removing the search index tags when add link edits are saved is not the only theory at play here. I'm also raising a skeptical at whether

$fields[WeightedTagsHooks::FIELD_NAME][] = LinkRecommendationTaskTypeHandler::WEIGHTED_TAG_PREFIX . '/' . CirrusIndexField::MULTILIST_DELETE_GROUPING;

actually works reliably in the first place, even in the circumstances when it is called.

Change #1079915 merged by jenkins-bot:

[mediawiki/extensions/GrowthExperiments@wmf/1.43.0-wmf.26] Clear LinkRecommendation suggestions on page save

https://gerrit.wikimedia.org/r/1079915

Mentioned in SAL (#wikimedia-operations) [2024-10-14T14:02:42Z] <lucaswerkmeister-wmde@deploy2002> Started scap sync-world: Backport for [[gerrit:1079923|refactor(tests): don't use per-method coverage annotation]], [[gerrit:1079894|refactor(HomepageHooks): extract method for simpler modifyability]], [[gerrit:1079915|Clear LinkRecommendation suggestions on page save (T364341 T372337)]], [[gerrit:1079925|Run fixLinkRecommendationData even when disabled in CC (T373176)]]

Mentioned in SAL (#wikimedia-operations) [2024-10-14T14:04:49Z] <lucaswerkmeister-wmde@deploy2002> migr, lucaswerkmeister-wmde: Backport for [[gerrit:1079923|refactor(tests): don't use per-method coverage annotation]], [[gerrit:1079894|refactor(HomepageHooks): extract method for simpler modifyability]], [[gerrit:1079915|Clear LinkRecommendation suggestions on page save (T364341 T372337)]], [[gerrit:1079925|Run fixLinkRecommendationData even when disabled in CC (T373176)]] synced to

Change #1080035 had a related patch set uploaded (by Michael Große; author: Michael Große):

[operations/mediawiki-config@master] eswiki: switch clearing link recommendations to PageSaveComplete hook

https://gerrit.wikimedia.org/r/1080035

Mentioned in SAL (#wikimedia-operations) [2024-10-14T14:09:31Z] <lucaswerkmeister-wmde@deploy2002> Finished scap sync-world: Backport for [[gerrit:1079923|refactor(tests): don't use per-method coverage annotation]], [[gerrit:1079894|refactor(HomepageHooks): extract method for simpler modifyability]], [[gerrit:1079915|Clear LinkRecommendation suggestions on page save (T364341 T372337)]], [[gerrit:1079925|Run fixLinkRecommendationData even when disabled in CC (T373176)]] (duration: 0

Just now, I ran the updated maintenance script against eswiki (and testwiki) to get baseline numbers before we flip the config switch:

migr@mwmaint2002:/srv/mediawiki/php-1.43.0-wmf.26$ mwscript extensions/GrowthExperiments/maintenance/fixLinkRecommendationData.php --wiki=testwiki --dry-run --search-index --db-table
DEPRECATION WARNING: Maintenance scripts are moving to Kubernetes. See
https://wikitech.wikimedia.org/wiki/Maintenance_scripts for the new process.
Maintenance hosts will be going away; please submit feedback promptly if
maintenance scripts on Kubernetes don't work for you. (T341553)
Total number of OK search index entries: 2631
 (results in multiple topics counted multiple times)Total number of dangling search-index entries: 181
Total number of OK db-table entries: 999
Total number of dangling db-table entries: 28
migr@mwmaint2002:/srv/mediawiki/php-1.43.0-wmf.26$ mwscript extensions/GrowthExperiments/maintenance/fixLinkRecommendationData.php --wiki=eswiki --dry-run --search-index --db-table
DEPRECATION WARNING: Maintenance scripts are moving to Kubernetes. See
https://wikitech.wikimedia.org/wiki/Maintenance_scripts for the new process.
Maintenance hosts will be going away; please submit feedback promptly if
maintenance scripts on Kubernetes don't work for you. (T341553)
  topic biography had more than 10K tasks
  topic women had more than 10K tasks
  topic films had more than 10K tasks
  topic media had more than 10K tasks
  topic music had more than 10K tasks
  topic television had more than 10K tasks
  topic sports had more than 10K tasks
  topic asia had more than 10K tasks
  topic europe had more than 10K tasks
  topic stem had more than 10K tasks
Total number of OK search index entries: 144452
 (results in multiple topics counted multiple times)Total number of dangling search-index entries: 49868
Total number of OK db-table entries: 105322
Total number of dangling db-table entries: 2378

Change #1080270 had a related patch set uploaded (by Michael Große; author: Michael Große):

[operations/puppet@production] growthexperiments.pp: track dangling records for fr+eswiki hourly

https://gerrit.wikimedia.org/r/1080270

Change #1080035 merged by jenkins-bot:

[operations/mediawiki-config@master] eswiki: switch clearing link recommendations to PageSaveComplete hook

https://gerrit.wikimedia.org/r/1080035

Mentioned in SAL (#wikimedia-operations) [2024-10-15T13:24:23Z] <urbanecm@deploy2002> Started scap sync-world: Backport for [[gerrit:1080035|eswiki: switch clearing link recommendations to PageSaveComplete hook (T372337)]], [[gerrit:1079520|s7: Reduce revision-slots cache expiry to 60 seconds (T183490)]]

Mentioned in SAL (#wikimedia-operations) [2024-10-15T13:26:38Z] <urbanecm@deploy2002> migr, urbanecm, zabe: Backport for [[gerrit:1080035|eswiki: switch clearing link recommendations to PageSaveComplete hook (T372337)]], [[gerrit:1079520|s7: Reduce revision-slots cache expiry to 60 seconds (T183490)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)

Mentioned in SAL (#wikimedia-operations) [2024-10-15T13:32:07Z] <urbanecm@deploy2002> Finished scap sync-world: Backport for [[gerrit:1080035|eswiki: switch clearing link recommendations to PageSaveComplete hook (T372337)]], [[gerrit:1079520|s7: Reduce revision-slots cache expiry to 60 seconds (T183490)]] (duration: 07m 44s)

For reference: the script output for eswiki and frwiki directly after merging the above change:

migr@mwmaint2002:~$ date
Tue 15 Oct 2024 01:38:14 PM UTC
migr@mwmaint2002:~$ mwscript extensions/GrowthExperiments/maintenance/fixLinkRecommendationData.php --wiki=eswiki --dry-run --search-index --db-table
DEPRECATION WARNING: Maintenance scripts are moving to Kubernetes. See
https://wikitech.wikimedia.org/wiki/Maintenance_scripts for the new process.
Maintenance hosts will be going away; please submit feedback promptly if
maintenance scripts on Kubernetes don't work for you. (T341553)
  topic biography had more than 10K tasks
  topic women had more than 10K tasks
  topic films had more than 10K tasks
  topic media had more than 10K tasks
  topic music had more than 10K tasks
  topic television had more than 10K tasks
  topic sports had more than 10K tasks
  topic asia had more than 10K tasks
  topic europe had more than 10K tasks
  topic stem had more than 10K tasks
Total number of OK search index entries: 144104
 (results in multiple topics counted multiple times)Total number of dangling search-index entries: 49962
Total number of OK db-table entries: 105189
Total number of dangling db-table entries: 2378
migr@mwmaint2002:~$ mwscript extensions/GrowthExperiments/maintenance/fixLinkRecommendationData.php --wiki=frwiki --dry-run --search-index --db-table
DEPRECATION WARNING: Maintenance scripts are moving to Kubernetes. See
https://wikitech.wikimedia.org/wiki/Maintenance_scripts for the new process.
Maintenance hosts will be going away; please submit feedback promptly if
maintenance scripts on Kubernetes don't work for you. (T341553)
  topic biography had more than 10K tasks
  topic films had more than 10K tasks
  topic media had more than 10K tasks
  topic music had more than 10K tasks
  topic sports had more than 10K tasks
  topic europe had more than 10K tasks
  topic biology had more than 10K tasks
  topic stem had more than 10K tasks
Total number of OK search index entries: 110410
 (results in multiple topics counted multiple times)Total number of dangling search-index entries: 53880
Total number of OK db-table entries: 122536
Total number of dangling db-table entries: 12022

Change #1080270 merged by RLazarus:

[operations/puppet@production] growthexperiments.pp: track dangling records for fr+eswiki hourly

https://gerrit.wikimedia.org/r/1080270

I just checked the logs and I see only errors:

Oct 15 17:10:00 mwmaint2002 mediawiki_job_growthexperiments-fixLinkRecommendationData-dryrun-eswiki[1387]: no version entry for `extensions/GrowthExperiments/maintenance/fixLinkRecommendationData.php`.
Oct 15 17:10:00 mwmaint2002 mediawiki_job_growthexperiments-fixLinkRecommendationData-dryrun-eswiki[1387]: Fatal error: no version entry for `extensions/GrowthExperiments/maintenance/fixLinkRecommendationData.php`.
Oct 15 17:10:00 mwmaint2002 mediawiki_job_growthexperiments-fixLinkRecommendationData-dryrun-eswiki[1387]:  in /srv/mediawiki/multiversion/MWMultiVersion.php on line 682
Oct 15 17:10:00 mwmaint2002 mediawiki_job_growthexperiments-fixLinkRecommendationData-dryrun-eswiki[1387]: Traceback (most recent call last):
Oct 15 17:10:00 mwmaint2002 mediawiki_job_growthexperiments-fixLinkRecommendationData-dryrun-eswiki[1387]:   File "/usr/local/bin/mw-cli-wrapper", line 61, in <module>
Oct 15 17:10:00 mwmaint2002 mediawiki_job_growthexperiments-fixLinkRecommendationData-dryrun-eswiki[1387]:     subprocess.run(cmd, check=True, shell=True)
Oct 15 17:10:00 mwmaint2002 mediawiki_job_growthexperiments-fixLinkRecommendationData-dryrun-eswiki[1387]:   File "/usr/lib/python3.7/subprocess.py", line 487, in run
Oct 15 17:10:00 mwmaint2002 mediawiki_job_growthexperiments-fixLinkRecommendationData-dryrun-eswiki[1387]:     output=stdout, stderr=stderr)
Oct 15 17:10:00 mwmaint2002 mediawiki_job_growthexperiments-fixLinkRecommendationData-dryrun-eswiki[1387]: subprocess.CalledProcessError: Command '/usr/local/bin/mwscript --wiki=eswiki extensions/GrowthExperiments/maintenance/fixLinkRecommendationData.php --search-index --db-table --dry-run --statsd' returned non-zero exit status 255.

@Urbanecm_WMF or @RLazarus: Any idea what I did wrong?

Instead of

mwscript --wiki=eswiki extensions/GrowthExperiments/maintenance/fixLinkRecommendationData.php

you want

mwscript extensions/GrowthExperiments/maintenance/fixLinkRecommendationData.php --wiki=eswiki

Sorry for not noticing on the Puppet patch! Feel free to send me a followup and I can merge it any time, doesn't have to wait for the Puppet request window.

As @RLazarus says... We might want to make mwscript POSIX compliant in this. Sorry for missing this in CR!

Change #1080453 had a related patch set uploaded (by Michael Große; author: Michael Große):

[operations/puppet@production] fix(growthexperiments.pp): correct order of arguments for mwscript

https://gerrit.wikimedia.org/r/1080453

Change #1080453 merged by RLazarus:

[operations/puppet@production] fix(growthexperiments.pp): correct order of arguments for mwscript

https://gerrit.wikimedia.org/r/1080453

Change #1082057 had a related patch set uploaded (by Michael Große; author: Michael Große):

[operations/mediawiki-config@master] eswiki: switch clearing link recommendations to PageSaveComplete hook

https://gerrit.wikimedia.org/r/1082057

Change #1082057 merged by jenkins-bot:

[operations/mediawiki-config@master] frwiki: switch clearing link recommendations to PageSaveComplete hook

https://gerrit.wikimedia.org/r/1082057

Mentioned in SAL (#wikimedia-operations) [2024-10-21T20:24:17Z] <tgr@deploy2002> Started scap sync-world: Backport for [[gerrit:1082057|frwiki: switch clearing link recommendations to PageSaveComplete hook (T372337)]]

Mentioned in SAL (#wikimedia-operations) [2024-10-21T20:26:31Z] <tgr@deploy2002> migr, tgr: Backport for [[gerrit:1082057|frwiki: switch clearing link recommendations to PageSaveComplete hook (T372337)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)

Mentioned in SAL (#wikimedia-operations) [2024-10-21T20:32:36Z] <tgr@deploy2002> Finished scap sync-world: Backport for [[gerrit:1082057|frwiki: switch clearing link recommendations to PageSaveComplete hook (T372337)]] (duration: 08m 19s)

Mentioned in SAL (#wikimedia-operations) [2024-10-28T14:51:50Z] <MichaelG_WMF> T372337 - run mwscript extensions/GrowthExperiments/maintenance/fixLinkRecommendationData.php --wiki=eswiki --search-index to fix the remaining ca. 10K dangling search index records

Running the fixLinkRecommendationData.php maintenance script again on eswiki has reduced the number of dangling recommendations by another order of magnitude down to 1000:

migr@mwmaint2002:/srv/mediawiki/php-1.43.0-wmf.28$ mwscript extensions/GrowthExperiments/maintenance/fixLinkRecommendationData.php --wiki=eswiki --search-index --dry-run
DEPRECATION WARNING: Maintenance scripts are moving to Kubernetes. See
https://wikitech.wikimedia.org/wiki/Maintenance_scripts for the new process.
Maintenance hosts will be going away; please submit feedback promptly if
maintenance scripts on Kubernetes don't work for you. (T341553)
  topic biography had more than 10K tasks
  topic media had more than 10K tasks
  topic europe had more than 10K tasks
  topic stem had more than 10K tasks
Total number of OK search index entries: 134126
 (results in multiple topics counted multiple times)Total number of dangling search-index entries: 10673
migr@mwmaint2002:/srv/mediawiki/php-1.43.0-wmf.28$ mwscript extensions/GrowthExperiments/maintenance/fixLinkRecommendationData.php --wiki=eswiki --search-index
DEPRECATION WARNING: Maintenance scripts are moving to Kubernetes. See
https://wikitech.wikimedia.org/wiki/Maintenance_scripts for the new process.
Maintenance hosts will be going away; please submit feedback promptly if
maintenance scripts on Kubernetes don't work for you. (T341553)
  topic biography had more than 10K tasks
  topic media had more than 10K tasks
  topic europe had more than 10K tasks
  topic stem had more than 10K tasks
Total number of OK search index entries: 126453
 (results in multiple topics counted multiple times)Total number of dangling search-index entries: 9920
migr@mwmaint2002:/srv/mediawiki/php-1.43.0-wmf.28$ mwscript extensions/GrowthExperiments/maintenance/fixLinkRecommendationData.php --wiki=eswiki --search-index --dry-run
DEPRECATION WARNING: Maintenance scripts are moving to Kubernetes. See
https://wikitech.wikimedia.org/wiki/Maintenance_scripts for the new process.
Maintenance hosts will be going away; please submit feedback promptly if
maintenance scripts on Kubernetes don't work for you. (T341553)
  topic biography had more than 10K tasks
  topic media had more than 10K tasks
  topic europe had more than 10K tasks
  topic stem had more than 10K tasks
Total number of OK search index entries: 133896
 (results in multiple topics counted multiple times)Total number of dangling search-index entries: 1016
migr@mwmaint2002:/srv/mediawiki/php-1.43.0-wmf.28$

The overall problem to keep in mind here is that the script stops going through the recommendations after 10000 search results have been evaluated for a topic. That means that there might be addtitional dangling records in biography/media/europe/stem. Running the script again with --random might improve the situation faster.

Mentioned in SAL (#wikimedia-operations) [2024-10-29T14:25:49Z] <MichaelG_WMF> T372337 clearing dangling database-records for link suggestions by running mwscript extensions/GrowthExperiments/maintenance/fixLinkRecommendationData.php --wiki=eswiki --db-table --force

Change #1089761 had a related patch set uploaded (by Michael Große; author: Michael Große):

[mediawiki/extensions/GrowthExperiments@master] maint: fix stats-collection flakyness by migrating to statslib

https://gerrit.wikimedia.org/r/1089761

Change #1090449 had a related patch set uploaded (by Michael Große; author: Michael Große):

[operations/puppet@production] growthexperiments.pp: track dangling records for cswiki hourly

https://gerrit.wikimedia.org/r/1090449

Change #1090449 merged by RLazarus:

[operations/puppet@production] growthexperiments.pp: track dangling records for cswiki hourly

https://gerrit.wikimedia.org/r/1090449

Change #1089761 merged by jenkins-bot:

[mediawiki/extensions/GrowthExperiments@master] maint: fix stats-collection flakyness by migrating to statslib

https://gerrit.wikimedia.org/r/1089761

I'd say this is now safe to call resolved, number of dangling records is down to nearly zero for both of them.

Change #1120556 had a related patch set uploaded (by Urbanecm; author: Urbanecm):

[operations/puppet@production] growthexperiments.pp: Mark unnecessary jobs as absent

https://gerrit.wikimedia.org/r/1120556

Change #1120557 had a related patch set uploaded (by Urbanecm; author: Urbanecm):

[operations/puppet@production] growthexperiments.pp: Drop absented jobs

https://gerrit.wikimedia.org/r/1120557

Change #1120556 merged by RLazarus:

[operations/puppet@production] growthexperiments.pp: Mark unnecessary jobs as absent

https://gerrit.wikimedia.org/r/1120556

Change #1120557 merged by RLazarus:

[operations/puppet@production] growthexperiments.pp: Drop absented jobs

https://gerrit.wikimedia.org/r/1120557