User story: As a searcher, I want words like Istanbul (common non-Turkish spelling) and İstanbul (Turkish spelling) to match when I search for one or the other.
Our custom analyzers that were unpacked generally use icu_normalizer instead of lowercase for lowercasing and other normalization. It converts İstanbul to i̇stanbul (normal English lowercase i with an extra vertical dot above—depending on your fonts, browser, and OS, the extra dot can be rendered above, next to, or invisibly on top of the regular dot on the i.) The character filter dotted_I_fix is used to fix this in analzyers converted as part of the unpacking project, but some older unpacked analyzers do not use it. The default analysis chain also uses icu_normalizer without dotted_I_fix.
There are a small number of languages (mostly Turkic, it looks like) that distinguish I/ı and İ/i, and they should probably not use dotted_I_fix and should use Turkish lowercasing (which is the same as lowercase except for the İ/i and I/ı pairs) before icu_normalizer, like Turkish does.
It might make sense to also see whether there is an appreciable difference in speed between using Turkish lowercasing and a simple character filter that maps İ/i and I/ı before letting icu_normalizer do the rest. (In the past, we just turned on the Turkish variant of lowercase because it existed and it was easy, even though icu_normalizer still has to run to handle all the more interesting basic normalization.)
Acceptance Criteria: Either dotted_I_fix or some form of İ/i and I/ı lowercasing is enabled everywhere icu_normalizer is enabled (with a few possible exceptions for language-specific analyzer components).





