Allow ULS mapping one language code to several languages
Open, HighPublic
Actions

Assigned To

None

Authored By

	Amire80
	May 25 2016, 5:29 AM

Description

It is sometimes useful to map one language code to several actual languages. This is particularly relevant with macro languages and with regional variants.

For example, the Mari language of Russia is listed as "chm" (macro) in the CLDR Territoty-Language table, but Wikipedia uses the particular codes mhr and mrj. Another issue is Belarusian—the CLDR table uses only "be" for Belarus, but Wikipedia uses "be" and "be-tarask".

There are almost certainly more cases—I haven't gone over the whole languages list. There can be issues with Azerbaijani, Punjabi, Chinese, Norwegian and more languages.

ULS currently doesn't support this, but we should add it, at least for some cases.

(In ContentTranslation, the split of "be" to "be" and "be-tarask" is hard-coded, and it also does weird tricks with Norwegian. This is very far from being bulletproof. We should fix it more thoroughly.)

Related Objects

Mentioned Here: T228575: Decrease number of open tickets with assignee field set for more than two years (aka cookie licking) (March-June 2020 edition)

Event Timeline

Amire80 created this task.May 25 2016, 5:29 AM

Restricted Application added a project: UniversalLanguageSelector. · View Herald TranscriptMay 25 2016, 5:29 AM

Restricted Application added subscribers: Zppix, Aklapper. · View Herald Transcript

FWIW, upstream bug for Mari: http://unicode.org/cldr/trac/ticket/9461 .

Amire80 moved this task from Backlog to Prioritised languages on the ULS-CompactLinks board.May 25 2016, 7:31 AM

This is important for users of Norwegian (no), and probably cause of missing Nynorsk (nn) for some users.

I would be inclined to wontfix this. The developer should instruct ULS to use the correct set of languages that make sense for the context. The ticket does not even explain what exactly should happen when a macrolanguage code is selected.

How is it implemented internally is a separate question, but there are definitely realistic use-cases. Most importantly, what @jeblad describes: if geolocation identifies "Norwegian", it is useful to show both Bokmål and Nynorsk in Compact Language Links. Same for Belarusian, which in the case of Wikipedia should definitely show both be and be-tarask.

Geolocation identifies countries, not languages. You can verify that we already have both nb and nn for Norway. For Belarusian we don't have be-tarask.

Liuxinyu970226 subscribed.Aug 29 2017, 11:51 AM

Restricted Application added a subscriber: jhsoby. · View Herald TranscriptAug 29 2017, 11:51 AM

Amire80 moved this task from Backlog to Language codes issues on the UniversalLanguageSelector board.Nov 1 2017, 2:51 PM

Amire80 added a project: Language-2018-Jan-Mar.Jan 2 2018, 1:44 PM

Arrbee assigned this task to Amire80.Jan 18 2018, 8:54 AM

Arrbee triaged this task as Medium priority.

Arrbee moved this task from Backlog to Priority backlog on the Language-2018-Jan-Mar board.

Arrbee raised the priority of this task from Medium to High.Jan 18 2018, 9:19 AM

Assuming this is only about the geolocation, I think an appropriate place to add a mapping would be the script that builds the complete data file.

Another question is how do we identify these mappings? Surely expanding en to en-US, en-GB, ec... in every place "en" is mentioned doesn't make sense.

Amire80 moved this task from Priority backlog to In Progress on the Language-2018-Jan-Mar board.Feb 14 2018, 9:27 AM

Arrbee removed a project: Language-2018-Jan-Mar.Feb 19 2018, 7:06 AM

This task has been assigned to the same task owner for more than two years. Resetting task assignee due to inactivity, to decrease task cookie-licking and to get a slightly more realistic overview of plans. Please feel free to assign this task to yourself again if you still realistically work or plan to work on this task - it would be welcome!

For tips how to manage individual work in Phabricator (noisy notifications, lists of task, etc.), see https://phabricator.wikimedia.org/T228575#6237124 for available options.
(For the records, two emails were sent to assignee addresses before resetting assignees. See T228575 for more info and for potential feedback. Thanks!)

Aklapper added a project: Language codes.Aug 14 2023, 9:09 AM

Allow ULS mapping one language code to several languagesOpen, HighPublicActions

Description

Related Objects

Event Timeline

Allow ULS mapping one language code to several languages
Open, HighPublic
Actions