|mediawiki/extensions/CirrusSearch : master||Enable ICU folding for en, fr and greek by default|
- Mentioned In
- T41501: Merging Unicode similar-looking characters in internal search (apostrophes, "x" and "×", etc)
T147505: [EPIC][Recurring task] CirrusSearch: what is updated during re-indexing
T146804: Map modifier letter apostrophes to straight or curly quotes in the French Elasticsearch analysis chain
- Mentioned Here
- T146804: Map modifier letter apostrophes to straight or curly quotes in the French Elasticsearch analysis chain
T41501: Merging Unicode similar-looking characters in internal search (apostrophes, "x" and "×", etc)
T102298: Add accent squashing to Russian/Cyrillic analyser
T137830: Use the icu_folding filter if available instead of asciifolding
Looks like we should turn this on! My analysis is here.
A lot of pronunciations for words get mapped onto those words now—that's cool!
A problem has been exposed: rarely, "modifier letter apostrophe" is used instead of a straight quote or curly right quote. Before ICU folding they were being indexed wrong; with this patch they will be indexed wrong, but in a different way. I've opened T146804 to map them, but it may not get done before the re-index happens.
I think this was and should be assigned to @dcausse . I probably should have created a separate task, but all I did was evaluate the effect of turning it on locally and analyzing big blobs of text. David knows how to configure it properly for deployment. (Unless someone wants me to do it.)