Page MenuHomePhabricator

CirrusSearch metadata stores DEFAULTSORT overrides even after they've been removed
Closed, ResolvedPublicBUG REPORT

Description

Steps to replicate the issue:

What should have happened instead?:
The "defaultsort" parameter should have been removed or restored to its default value.

Other information (browser name/version, screenshots, etc.):
See https://en.wikipedia.org/wiki/Wikipedia:Village_pump_(technical)#Why_does_%22Monkey%22_appear_in_the_dropdown_search_results_for_%22Obama%22? for further context. Interestingly, https://test.wikipedia.org/w/api.php?action=query&prop=cirrusbuilddoc&titles=Test&cbbuilders=content does not seem to be affected by this.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Thanks for reporting this, I think there are two different issues that allowed such suggestions to appear:

  • defaultsort is indeed not properly removed from the search index when it's erased, a null value unfortunately tells the system to ignore it when updating it, this needs to be fixed for this field
  • defaultsort values are allowed to help completion only if they match a particular pattern, this pattern seems too permissive and should be corrected to limit the possibility of such vandalism to impact search suggestions in the future
dcausse triaged this task as High priority.

Change #1207806 had a related patch set uploaded (by DCausse; author: DCausse):

[mediawiki/extensions/CirrusSearch@master] Fix filtering of relevant default sort suggestions

https://gerrit.wikimedia.org/r/1207806

Change #1207812 had a related patch set uploaded (by DCausse; author: DCausse):

[mediawiki/extensions/CirrusSearch@wmf/1.46.0-wmf.3] Fix filtering of relevant default sort suggestions

https://gerrit.wikimedia.org/r/1207812

Change #1207813 had a related patch set uploaded (by DCausse; author: DCausse):

[mediawiki/extensions/CirrusSearch@wmf/1.46.0-wmf.2] Fix filtering of relevant default sort suggestions

https://gerrit.wikimedia.org/r/1207813

Change #1207806 merged by jenkins-bot:

[mediawiki/extensions/CirrusSearch@master] Fix filtering of relevant default sort suggestions

https://gerrit.wikimedia.org/r/1207806

Change #1207813 merged by jenkins-bot:

[mediawiki/extensions/CirrusSearch@wmf/1.46.0-wmf.2] Fix filtering of relevant default sort suggestions

https://gerrit.wikimedia.org/r/1207813

Change #1207812 merged by jenkins-bot:

[mediawiki/extensions/CirrusSearch@wmf/1.46.0-wmf.3] Fix filtering of relevant default sort suggestions

https://gerrit.wikimedia.org/r/1207812

Mentioned in SAL (#wikimedia-operations) [2025-11-20T14:34:53Z] <ladsgroup@deploy2002> Started scap sync-world: Backport for [[gerrit:1207813|Fix filtering of relevant default sort suggestions (T410602)]], [[gerrit:1207812|Fix filtering of relevant default sort suggestions (T410602)]]

Mentioned in SAL (#wikimedia-operations) [2025-11-20T14:40:14Z] <ladsgroup@deploy2002> ladsgroup, dcausse: Backport for [[gerrit:1207813|Fix filtering of relevant default sort suggestions (T410602)]], [[gerrit:1207812|Fix filtering of relevant default sort suggestions (T410602)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.

Mentioned in SAL (#wikimedia-operations) [2025-11-20T14:45:19Z] <ladsgroup@deploy2002> Finished scap sync-world: Backport for [[gerrit:1207813|Fix filtering of relevant default sort suggestions (T410602)]], [[gerrit:1207812|Fix filtering of relevant default sort suggestions (T410602)]] (duration: 10m 25s)

Mentioned in SAL (#wikimedia-operations) [2025-11-20T14:49:22Z] <dcausse@deploy2002> mwscript-k8s job started: extensions/CirrusSearch/maintenance/UpdateSuggesterIndex.php enwiki --masterTimeout=10m --replicationTimeout=5400 --indexChunkSize=3000 --cluster=eqiad --optimize # T410602 reindexing search suggestions on enwiki

Mentioned in SAL (#wikimedia-operations) [2025-11-20T14:49:26Z] <dcausse@deploy2002> mwscript-k8s job started: extensions/CirrusSearch/maintenance/UpdateSuggesterIndex.php enwiki --masterTimeout=10m --replicationTimeout=5400 --indexChunkSize=3000 --cluster=eqiad --optimize # T410602 reindexing search suggestions on enwiki

Mentioned in SAL (#wikimedia-operations) [2025-11-20T14:53:37Z] <dcausse@deploy2002> mwscript-k8s job started: extensions/CirrusSearch/maintenance/UpdateSuggesterIndex.php frwiki --masterTimeout=10m --replicationTimeout=5400 --indexChunkSize=3000 --cluster=eqiad --optimize # T410602 reindexing search suggestions on frwiki

Mentioned in SAL (#wikimedia-operations) [2025-11-20T15:14:17Z] <dcausse@deploy2002> mwscript-k8s job started: extensions/CirrusSearch/maintenance/UpdateSuggesterIndex.php hewiki --masterTimeout=10m --replicationTimeout=5400 --indexChunkSize=3000 --cluster=eqiad --optimize # T410602 reindexing search suggestions on hewiki

Mentioned in SAL (#wikimedia-operations) [2025-11-20T15:39:18Z] <dcausse@deploy2002> mwscript-k8s job started: extensions/CirrusSearch/maintenance/UpdateSuggesterIndex.php frwiki --masterTimeout=10m --replicationTimeout=5400 --indexChunkSize=3000 --cluster=eqiad --optimize # T410602 reindexing search suggestions on frwiki

Mentioned in SAL (#wikimedia-operations) [2025-11-20T16:00:26Z] <dcausse@deploy2002> mwscript-k8s job started: extensions/CirrusSearch/maintenance/UpdateSuggesterIndex.php hewiki --masterTimeout=10m --replicationTimeout=5400 --indexChunkSize=3000 --cluster=eqiad --optimize # T410602 reindexing search suggestions on hewiki

Change #1212092 had a related patch set uploaded (by DCausse; author: DCausse):

[operations/deployment-charts@master] cirrus: bump job image version

https://gerrit.wikimedia.org/r/1212092

Change #1212092 merged by jenkins-bot:

[operations/deployment-charts@master] cirrus: bump job image version

https://gerrit.wikimedia.org/r/1212092

The update process has been fixed.
Existing stale data in the search index will get fixed when:

  • a new revision of the page is created
  • a template change propagates
  • when the continuous cleanup mechanism processes a page with stale data

Worst case is 2 months for the latter.

@ChildrenWillListen many thanks for debugging the problem and filing this ticket so rapidly!

We have created a couple followup tickets to help mitigate similar situations in the future:

  • T410899: I believe this was the main problem, having stale data in the search index for so long is a ticking bomb
  • T411169: I think it's important that we have some tools to debug and explain why search behaves this way in situation like this