Description
Details
Project | Branch | Lines +/- | Subject | |
---|---|---|---|---|
mediawiki/extensions/CirrusSearch | master | +15 -1 | Enable ICU folding for en, fr and greek by default |
Related Objects
- Mentioned In
- T41501: Merging Unicode similar-looking characters in internal search (apostrophes, "x" and "×", etc)
T147505: [tracking] CirrusSearch: what is updated during re-indexing
T146804: Map modifier letter apostrophes to straight or curly quotes in the French Elasticsearch analysis chain - Mentioned Here
- T146804: Map modifier letter apostrophes to straight or curly quotes in the French Elasticsearch analysis chain
T41501: Merging Unicode similar-looking characters in internal search (apostrophes, "x" and "×", etc)
T102298: Add accent squashing to Russian/Cyrillic analyser
T137830: Use the icu_folding filter if available instead of asciifolding
Event Timeline
Looks like we should turn this on! My analysis is here.
A lot of pronunciations for words get mapped onto those words now—that's cool!
A problem has been exposed: rarely, "modifier letter apostrophe" is used instead of a straight quote or curly right quote. Before ICU folding they were being indexed wrong; with this patch they will be indexed wrong, but in a different way. I've opened T146804 to map them, but it may not get done before the re-index happens.
I think this was and should be assigned to @dcausse . I probably should have created a separate task, but all I did was evaluate the effect of turning it on locally and analyzing big blobs of text. David knows how to configure it properly for deployment. (Unless someone wants me to do it.)
Change 313838 had a related patch set uploaded (by DCausse):
Enable ICU folding for en, fr and greek by default
Added greek to the list, I thought we agreed to enable it on fulltext search as well, currently it's only enabled on greek wikipedia with the completion suggester.
Change 313838 merged by jenkins-bot:
Enable ICU folding for en, fr and greek by default