Page MenuHomePhabricator

Allow ULS mapping one language code to several languages
Open, HighPublic

Description

It is sometimes useful to map one language code to several actual languages. This is particularly relevant with macro languages and with regional variants.

For example, the Mari language of Russia is listed as "chm" (macro) in the CLDR Territoty-Language table, but Wikipedia uses the particular codes mhr and mrj. Another issue is Belarusian—the CLDR table uses only "be" for Belarus, but Wikipedia uses "be" and "be-tarask".

There are almost certainly more cases—I haven't gone over the whole languages list. There can be issues with Azerbaijani, Punjabi, Chinese, Norwegian and more languages.

ULS currently doesn't support this, but we should add it, at least for some cases.

(In ContentTranslation, the split of "be" to "be" and "be-tarask" is hard-coded, and it also does weird tricks with Norwegian. This is very far from being bulletproof. We should fix it more thoroughly.)

Event Timeline

Restricted Application added subscribers: Zppix, Aklapper. · View Herald Transcript

This is important for users of Norwegian (no), and probably cause of missing Nynorsk (nn) for some users.

I would be inclined to wontfix this. The developer should instruct ULS to use the correct set of languages that make sense for the context. The ticket does not even explain what exactly should happen when a macrolanguage code is selected.

How is it implemented internally is a separate question, but there are definitely realistic use-cases. Most importantly, what @jeblad describes: if geolocation identifies "Norwegian", it is useful to show both Bokmål and Nynorsk in Compact Language Links. Same for Belarusian, which in the case of Wikipedia should definitely show both be and be-tarask.

Geolocation identifies countries, not languages. You can verify that we already have both nb and nn for Norway. For Belarusian we don't have be-tarask.

Arrbee triaged this task as Medium priority.
Arrbee moved this task from Backlog to Priority backlog on the Language-2018-Jan-Mar board.
Arrbee raised the priority of this task from Medium to High.Jan 18 2018, 9:19 AM

Assuming this is only about the geolocation, I think an appropriate place to add a mapping would be the script that builds the complete data file.

Another question is how do we identify these mappings? Surely expanding en to en-US, en-GB, ec... in every place "en" is mentioned doesn't make sense.

This task has been assigned to the same task owner for more than two years. Resetting task assignee due to inactivity, to decrease task cookie-licking and to get a slightly more realistic overview of plans. Please feel free to assign this task to yourself again if you still realistically work or plan to work on this task - it would be welcome!

For tips how to manage individual work in Phabricator (noisy notifications, lists of task, etc.), see https://phabricator.wikimedia.org/T228575#6237124 for available options.
(For the records, two emails were sent to assignee addresses before resetting assignees. See T228575 for more info and for potential feedback. Thanks!)