Since we reindex on a somewhat infrequent basis, we should have a ticket that collects the updates that will happen during the next reindex.
This will be a recurring ticket, meaning that each time we do a reindex, we'll note it here and then use it again to list out new updates that will take effect the next time a reindex is done.
We should make or consider making announcements to the various communities that will be affected by the re-indexing, especially when there is a big gap between the discussion of the upcoming re-index and the actual re-index.
Items not yet done:
- add your ticket here
- T377226: Remove LabelCountField from WikibaseCirrusSearch
Items done:
- T375557: Reindex all wikis to enable folding harmonization and new functionality
- T363734: Reindex all wikis to enable dotted I fix, Yiddish ligatures, maybe Arabic normalization
- T342444: Reindex all wikis to enable apostrophe normalization, camelCase handling, acronym handling, word_break_helper, and icu_tokenizer/_repair
- T337064: Reindex Turkish wikis to enable improved apostrophe handling
- T335704: Reindex Estonian wikis to enable new unpacked analyzer
- T333398: Reindex brwikimedia to use new unpacked Brazlian Portuguese analysis chain
- T330783: Reindex Romanian, Sorani wikis to enable unpacked analyzers
- T328315: Reindex Bulgarian, Lithuanian, Persian wikis to enable unpacked analyzers
- T327720: Reindex Japanese-language wikis to use unpacked CJK analyzer
- T327801: Reindex Armenian, Latvian, Hungarian wikis to enable unpacked analyzers
- T323927: Reindex Ukrainian-language wikis to enable unpacked analysis
- T322044: Reindex Egyptian Arabic and Moroccan Arabic wikis to enable Arabic language analysis
- T317200: Reindex all wikis to fix nnbsp regression After 7.10 rollout is complete and seems stable and we are ready to test reindexing
- T317546: Add new elasticsearch field to index the number of outgoing links
- T315907: Reindex Nias Wikis to enable better apostrophe handling
- T315265: Reindex Bengali wikis to enable new analyzer
- T294257: Reindex Hindi, Irish, Norwegian wikis to enable unpacked versions
- T290079: Reindex Czech, Finnish, Galician wikis to enable unpacked versions
- T284691: Reindex Basque, Catalan, Danish wikis to enable unpacked versions
- T273508: CirrusSearch should expose a function to reset its weighted_tags for a particular tag category — Adds weighted_tags field (add --fieldsToDelete ores_articletopics,ores_articletopic to the UpdateSearchIndexConfig script)
- T280601: Reindex Commons and Wikidata on eqiad and cloudelastic
- T284185: Reindex German, Dutch, and Portugese Wikis to Enabled Unpacked Versions after T281379 is deployed.
- T282808: Reindex Spanish-language wikis to enable unpacked version of Spanish analysis
- T274200: Reindex English and Italian wikis to enable homoglyph plugin — after T268730 is deployed
- T274205: Reindex Khmer wikis to enable Khmer syllable reordering — after parent tasks are complete
- T222669: Normalize homoglyphs in mixed-script tokens when possible - reindex French to test homoglyph plugin in production
- T235654: Re-index Slovak Wikis to enable folding of Slovak diacritics after stemming — after T235561 is deployed
- T206613: Search of wikidata string property values using haswbstatement is case sensitive wikidata & commonswiki
- T215967: Add keyword for filtering based on captions in specific language - Reindex wikidatawiki (and commonswiki?) for inlabel
- T217806: Reindex Greek, Turkish, and Irish wikis to keep lang-specific lowercasing & enable empty-token filtering (Greek)
- T221691: haswbstatement: P180 not working on production Commons - looks like commonswiki needs mappings update
- T216738: Reindex Korean-language wikis to enable Nori analyzer—After ES6 upgrade and turning off LTR for Korean
- T195071: Add chronological sorting by-page-creation-timestamp for search results - needs full reindex (from wikitext) to populate new page creation date field
- T209156: Re-index Chinese Wikis to fix Surrogate Split - after T168427 and T209155 are deployed
- T193407: Store wikibase statement qualifiers in cirrus search index (wikidata reindex)
- T199884: Support haswbstatement in other properties (wikidata reindex)
- T163642: Index Wikidata strings in statements for fulltext search (wikidata reindex)
- T203005: Re-index Esperanto Wikis
- T200037: Re-index Polish Wikis to patch Stempel stems
- T200204: Re-index Malay and Indonesian Wikis to use new unpacked analysis chain
- T197890: Re-index Mirandese Wikis
- T196404: Re-Re-Index Serbian Wikis after refactored plugins are deployed
- T196658: Re-index Croatian, Serbo-Croatian, and Bosnian Wikis
- T195912: Add Lexeme data to source text index
- T191545: Re-index Slovak Wikis after analysis chain is deployed (After T191543 and T191544 are complete)
- T189265: Re-index Serbian Wikis
- T188452: In-place reindex all wiki's to pickup new trigram index for title and redirect.title field
- T182293: Tune wikidata fulltext search similarity parameters
- T163642: Index Wikidata strings in statements for fulltext search - needs full reindex of Wikidata
- T181426: Reindex wikidata to enable description index
- T175199: Index certain statements for Wikidata items (this needs full reindex of Wikidata)
- T177871: Re-index un-fallbacked languages
- T176397: Reindex default namespaces that were moved from general to content indices
- T167058: Re-index Hebrew-language wikis
- T173464: Re-index Chinese Wikis
- T162302: Add archive index to wikis
- T162292: Reindex wikidata to pick up labels/descriptions mappings
- T144429: Commit changes to implement ascii-folding for French
- T142721: Deploy ascii-folding / stemming re-ordering changes
- T142620: Test effect of adding ascii-folding on French Wikipedia
- T141216: ÿ in Spécial:IndexPages search
- T142037: Test effect of re-ordering kstem and asciifolding on English Wikipedia
- T124592: Cyrillic 'Е' and 'Ё' equivalence not found by search
- T102298: Add accent squashing to Russian/Cyrillic analyser
- T146804: Map modifier letter apostrophes to straight or curly quotes in the French Elasticsearch analysis chain
- T146358: Improve processing of the apostrophe by the search engine in Ukrainian
- T148052: Enable Latvian and Lithuanian analyzers
- T41501: Merging Unicode similar-looking characters in internal search (apostrophes, "x" and "×", etc)
- T145023: Searching for insource:tag finds <tag> but not {{#tag:tag}}
- T137830: Use the icu_folding filter if available instead of asciifolding
- T146402: Add ICU_folding filter for EN, FR and EL wiki projects
- T145561: Reindex all image files to include metadata index fields
- T146907: Adding ability to search by metadata: document and announce
- T132637: Lack of diacritic folding in e.g. Ancient Greek
- T156371: ContentHandler should expose the content-model to search engines.
- T158682: Deploy new Polish language analyser
- T161139: Reindex Swedish language projects once analysis update is deployed
- T162055: Deploy New Ukrainian Analyzer & Re-index Ukrainian Wikis
- T163832: Reindex Chinese wikis
- T162741: Test and analyze new Hebrew language analyzers
- T179945: Re-index English-language wikis to pick up kana mapping
- T223046: Lack of case sensitivity with hastemplate:
- T240550: Add mapping for ORES topic field in ElasticSearch
- T246882: commonswiki shard size grew more than 50G in eqiad and codfw - deployed
- gerrit: 526621 : new page_id field for all wikis
- T323628: Optimize the WikibaseCirrusSearch elasticsearch mapping and filter query for non-english users - add top level indexed field for labels in stemmed languages within wikibase instances