User Story: As a multi-lingual searcher, I would like more consistency and predictability in how character folding works across wikis.
Some languages have ASCII folding disabled, some have it enabled, some have it enabled with the option to preserve the unfolded original; some upgrade ASCII folding (with or without preserving the original) to ICU folding.
Acceptance Critera:
- Either an update to AnalysisConfigBuilder to make ASCII-folding / ASCII-folding preserve more consistently used or a better understanding of why it should be different across languages.
- Bonus: An easy mechanism to enable custom ICU folding for a given language code without having to create a full analysis config for that language. (This may already exist.)
Summary list of affected languages: Assamese (as), Azerbaijani (az), Crimean Tatar (crh), Greek (el), French (fr), Gagauz (gag), Gujarati (gu), Indonesian (id), Igbo (ig), Italian (it), Georgian (ka), Kazakh (kk), Khmer (km), Kannada (kn), Korean (ko), Malayalam (ml), Marathi (mr), Malay(ms), Mirandese (mwl), Burmese (my), Nepali (ne), Odia (or), Punjabi (pa), Polish (pl), Sinhala (si), Slovenian (sl), Albanian (sq), Swedish (sv), Swahili (sw), Tamil (ta), Telugu (te), Tagalog (tl), Tatar (tt), Uzbek (uz), Vietnamese (vi), Chinese (zh)