Page MenuHomePhabricator

LanguagePicker's handling of script suffixes is broken
Open, Needs TriagePublic

Description

Comments added to LanguagePicker.js say that it tries to "latinize" some content based on script suffixes like _rm or _pinyin if requested language uses the same script. The way it actutally works is currently problematic in the following aspects:

  1. It picks Latin-suffix name for non-Latin languages (T208927).
  2. It picks codes like "sr-Latn" that is often associated to name variant that doesn't apply to any other Latin-script language. It does so even if local name (associated to OSM "name" key) already is in Latin script (T195318, T229516).
  3. It doesn't pick "zh_pinyin" name for Latin-script languages, e.g. it's currently provided for Tongliao label node, but Dutch-language tile still displays non-Latin name.

These script suffixes should probably be ignored for latinization purpose outside relevant region. There are currently e.g. 476 uses of "sr-Latn" outside Serbia and 356 uses of "zh_pinyin" outside China/Taiwan that are probably relevant to only Serbian and Chinese itself.

If it would be possible for LanguagePicker to actually differentiate between languages by script and also the region of given name, then picking certain codes like "zh_pinyin", "ja_rm", "ko-Latn" for Latin-script languages is probably appropriate. If it isn't easy to achieve then for a start it might be more appropriate to ignore script suffixes.

Tasks mentioned above cover particular cases where wrong name is displayed. This task intends summarize the underlying issue.

Event Timeline

This problem is becoming worse and worse over time. Major place names in Manhattan are almost entirely Serbian at this point because name:sr-Latn is prioritized higher than the default name (which is English). The latinization code should probably not do anything if the default text is already entirely Latin characters.

Screenshot 2024-05-07 at 17.36.32.png (1×1 px, 2 MB)

Can we please get this ticket triaged and assigned?