Page MenuHomePhabricator

Assess cleaning up Wikibase code around CirrusSearch sanitizer job
Closed, ResolvedPublic

Description

The sanitizer (or saneitizer) job was disabled during the terms migration; see if we can enable it again, and maybe also improve it / make it more efficient as outlined in T239931 (try unsetting the 'generate-html' flag).

Event Timeline

Claiming the task since I started looking into it.

Change 632751 had a related patch set uploaded (by Lucas Werkmeister (WMDE); owner: Lucas Werkmeister (WMDE)):
[mediawiki/extensions/Jade@master] Override getParserOutputForIndexing to ensure HTML

https://gerrit.wikimedia.org/r/632751

Change 632752 had a related patch set uploaded (by Lucas Werkmeister (WMDE); owner: Lucas Werkmeister (WMDE)):
[mediawiki/core@master] Don’t generate HTML in getParserOutputForIndexing

https://gerrit.wikimedia.org/r/632752

Change 633170 had a related patch set uploaded (by Lucas Werkmeister (WMDE); owner: Lucas Werkmeister (WMDE)):
[mediawiki/extensions/Wikibase@master] Override EntityHandler::getParserOutputForIndexing

https://gerrit.wikimedia.org/r/633170

Change 632751 abandoned by Lucas Werkmeister (WMDE):
[mediawiki/extensions/Jade@master] Override getParserOutputForIndexing to ensure HTML

Reason:
No longer needed with PS2 of I50f3a530f2, where generating HTML is no longer skipped by default.

https://gerrit.wikimedia.org/r/632751

Change 632752 merged by jenkins-bot:
[mediawiki/core@master] Clarify HTML generation for indexing in ContentHandler

https://gerrit.wikimedia.org/r/632752

Change 633170 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] Override EntityHandler::getParserOutputForIndexing

https://gerrit.wikimedia.org/r/633170

This task can be closed. We’ve done the assessment, and will turn the sanitizer back on once the above changes are deployed, using T239931: Reduce the impact of the sanitizer on wikidata to track the progress.