⚓ T151269 Add English names for languages which don't yet have one

	Subject	Repo	Branch	Lines +/-
	Add 8 languages to be used by structured data but not in CLDR	mediawiki/extensions/cldr	master	+8 -0
	Add English names for languages which don't yet have one	mediawiki/extensions/cldr	master	+43 -2

Status	Assigned	Task
Open	None	T124286 [Epic] Wikidata language support
Resolved	None	T172222 Many languages names in UploadWizard display in their native script (autonyms) rather than English
Resolved	Raymond	T151269 Add English names for languages which don't yet have one
Open	None	T168799 Integrate IANA language registry with language-data and MediaWiki (let MediaWiki "knows" all languages with ISO 639-1/2/3 codes)
Resolved	Raymond	T134348 English names of azb: and be-x-old: in SiteMatrix are not in English

Nikki created this task.Nov 21 2016, 11:38 PM

Restricted Application added subscribers: revi, Aklapper. · View Herald TranscriptNov 21 2016, 11:38 PM

Lea_Lacroix_WMDE subscribed.Nov 22 2016, 9:50 AM

matej_suchanek added a project: I18n.Nov 23 2016, 1:26 PM

Liuxinyu970226 subscribed.Nov 24 2016, 6:28 AM

Language names are managed in a project called CLDR, see http://cldr.unicode.org. MediaWiki, UniversalLanguageSelector, Wikibase and so on are using this via a tiny extension (https://www.mediawiki.org/wiki/Extension:CLDR). The preferred way of adding missing language names is by filling a ticket at http://unicode.org/cldr/trac. I did that once with no problem, and would like to encourage you to do as well.

The second way is to do changes to the file https://phabricator.wikimedia.org/diffusion/ECLD/browse/master/LocalNames/LocalNamesEn.php via Gerrit patches. This can be a temporary workaround as long as there is no new CLDR version with the requested changes released. (Don't forget to report stuff at CLDR, and link to the ticket in your Gerrit patch or an inline comment.)

Additionally, MediaWiki does have a setting called "…ExtraLanguageNames". We are adding a few language names especially for Wikidata. See https://phabricator.wikimedia.org/diffusion/OMWC/browse/master/wmf-config/InitialiseSettings.php;1fd6734383b393eedf0004cc59f33f388ca89c5a$16360. I will paste the relevant snippet here:

'abe' => 'wôbanakiôdwawôgan', // T150633
'din' => 'dinka',           // T75563
'kea' => 'Kabuverdianu',    // T127435
'nod' => 'ᨣᩴᩤᨾᩮᩥᩬᨦ',            // T93880
'ota' => 'لسان توركى',      // T59342
'rwr' => 'मारवाड़ी',           // T61905
'sje' => 'bidumsámegiella', // T146707
'smj' => 'julevsámegiella', // T146707

As you can see this does not add the English names of these languages, but the name in the language itself.

I had a look at that tracker and found http://unicode.org/cldr/trac/ticket/9137 where two of the codes here (fkv and sje) were already requested but rejected. Given the following comment, it seems like it would be a waste of time to request the addition of more languages there:

We agreed to document there is no intent for CLDR to have the English names of all languages (there are over 7,000) of them, and point to http://www.iana.org/assignments/language-subtag-registry/language-subtag-registry as a source for any extra ones that people need.

More detailed comments:

nl-informal and roa-tara are Wikimedia inventions, so they would definitely need to go into LocalNamesEn.php.

bxr and mo also seem to have been rejected a few years ago in http://unicode.org/cldr/trac/ticket/6763

All of the language-only codes I listed are in the subtag registry mentioned in the comment, but that only provides English names so adding support for that like they suggest might not be worth the effort (versus just adding the ones we need to LocalNamesEn.php).

The country variants (ko-kp, zh-mo, zh-my) could be generated from the language name we already have plus the country name from CLDR. There's also kk-cn, kk-kz, kk-tr, zh-cn, zh-hk, zh-sg, zh-tw (which are already in LocalNamesEn.php). That seems like a good idea, because then they would be automatically translated into lots of languages instead of most languages falling back to English or the native name.

Theoretically, the script variants (ady-cyrl, aeb-arab, aeb-latn, kbd-cyrl, ku-arab, shi-latn, shi-tfng, plus 34 others already in LocalNamesEn.php) could also be generated, but the script names in CLDR do not include the word "script". That's not ideal because some of the scripts share the same name as a language (e.g. ku-arab would become "Kurdish (Arabic)", the meaning of which is not very clear).

Nikerabbit subscribed.Nov 25 2016, 6:56 AM

jhsoby subscribed.Nov 27 2016, 9:03 AM

Thanks for adding zh-mo, since there's some differences between Hong Kong and Macau words pointed on zhwiki.

I doubt if "Chinese (Malaysia)" (zh-my) is still useful, since there's unlikely having difference between this and Singaporean (zh-sg) (if someone could point that I will thank to them too), maybe it's not worth to drop zh-my? I have no enough time on it.

Nikki mentioned this in T153850: Include "special" languages in language selector for monolingual text.Dec 21 2016, 3:13 PM

Nikki updated the task description. (Show Details)Dec 21 2016, 3:17 PM

Shizhao added a project: Chinese-Sites.Dec 27 2016, 6:50 AM

@Lydia_Pintscher Is Shizhao's action above valid? The main topic of this task looks like about missing English names of WD language tags (which therefore this fits MediaWiki-extensions-CLDR ).

Nemo_bis mentioned this in T124283: Replace tracking bug T125033 by new project tag "Chinese-Sites".Dec 28 2016, 8:42 AM

It doesn't hurt I guess :)

Add English names for languages which don't yet have one
Closed, ResolvedPublic
Actions

Description

Details

Related Objects
Search...

Event Timeline

Add English names for languages which don't yet have oneClosed, ResolvedPublicActions

Description

Details

Related ObjectsSearch...

Event Timeline

Add English names for languages which don't yet have one
Closed, ResolvedPublic
Actions

Related Objects
Search...