As a follow up to T332342: Standardize ASCII-folding/ICU-folding across analyzers, apply ICU folding appropriately to more languages.
Likely next candidates include those that remain in the top 90 languages in my list (by unique query volume), grouped here by script:
- (Latin script) Afrikaans/af, Icelandic/is, Latin/la, Welsh/cy, Asturian/ast, Scots/sco, Luxembourgish/lb, Alemannic/als, Breton/br
- (Cyrillic) Mongolian/mn, Macedonian/mk, Kyrgyz/ky, Belarusian/be, Belarusian-Taraškievica/be-tarask, Tajik/tg (cy/la)
- (Arabic script) Urdu/ur, Kurdish/ku (ar/la)
- (CJK) Cantonese/zh-yue
To finish off languages with Wikipedias with 100,000 or more articles, we'd need to cover these, too:
- (Latin script) Cebuano/ceb, Waray/war, Min Nan/zh-min-nan, Ladin/lld, Minangkabau/min
- (Cyrillic) Chechen/ce
- (Arabic script) South Azerbaijani/azb