Add English names for languages which don't yet have one
Open, Needs TriagePublic

Description

There are a number of languages which currently don't display an English name when used on Wikidata, e.g. many of the examples on this page.

Could the following English names be added?

  • abe: "Western Abenaki"
  • ady-cyrl: "Adyghe (Cyrillic script)"
  • aeb-arab: "Tunisian Arabic (Arabic script)"
  • aeb-latn: "Tunisian Arabic (Latin script)"
  • azb: "South Azerbaijani"
  • bxr: "Buryat"
  • dty: "Doteli"
  • ett: "Etruscan"
  • fkv: "Kven"
  • lbe: "Lak"
  • kbd-cyrl: "Kabardian (Cyrillic script)"
  • ko-kp: "Korean (North Korea)"
  • koy: "Koyukon"
  • ku-arab: "Kurdish (Arabic script)"
  • lld: "Ladin"
  • mo: "Moldovan"
  • moe: "Montagnais"
  • nl-informal: "Dutch (informal address)"
  • nys: "Noongar"
  • nod: "Northern Thai"
  • otk: "Old Turkish"
  • roa-tara: "Tarantino"
  • rwr: "Marwari (India)"
  • shi-latn: "Tachelhit (Latin script)"
  • shi-tfng: "Tachelhit (Tifinagh script)"
  • sje: "Pite Sami"
  • tzl: "Talossan"
  • zh-mo: "Chinese (Macau)"
  • zh-my: "Chinese (Malaysia)"

The special code "mis" is also missing an English name. http://www-01.sil.org/iso639-3/documentation.asp?id=mis calls it "Uncoded languages" but perhaps something like "other language" or "unsupported language" would be better for the way it's used in Wikidata.

Also, while I'm requesting updates, I think the following three should be changed:

  • bbc-latn: Change "Batak Toba" to "Batak Toba (Latin script)"
  • gan-hans: Change "Simplified Gan script" to "Gan (Simplified)"
  • gan-hant: Change "Traditional Gan script" to "Gan (Traditional)"

"Batak Toba" is currently used as the name for both bbc and bbc-latn so they aren't distinguishable. All other -latn codes include "(Latin script)" in the name.

For Gan, the current names sound really odd (the phrasing "... script" is used for script names, not languages). We normally put script information in brackets after the language name, so that's what I've suggested here. It would also match the way cjy-hans/cjy-hant are named in this file.

Nikki created this task.Nov 21 2016, 11:38 PM
Restricted Application added subscribers: revi, Aklapper. · View Herald TranscriptNov 21 2016, 11:38 PM
thiemowmde added a subscriber: thiemowmde.

Language names are managed in a project called CLDR, see http://cldr.unicode.org. MediaWiki, UniversalLanguageSelector, Wikibase and so on are using this via a tiny extension (https://www.mediawiki.org/wiki/Extension:CLDR). The preferred way of adding missing language names is by filling a ticket at http://unicode.org/cldr/trac. I did that once with no problem, and would like to encourage you to do as well.

The second way is to do changes to the file https://phabricator.wikimedia.org/diffusion/ECLD/browse/master/LocalNames/LocalNamesEn.php via Gerrit patches. This can be a temporary workaround as long as there is no new CLDR version with the requested changes released. (Don't forget to report stuff at CLDR, and link to the ticket in your Gerrit patch or an inline comment.)

Additionally, MediaWiki does have a setting called "…ExtraLanguageNames". We are adding a few language names especially for Wikidata. See https://phabricator.wikimedia.org/diffusion/OMWC/browse/master/wmf-config/InitialiseSettings.php;1fd6734383b393eedf0004cc59f33f388ca89c5a$16360. I will paste the relevant snippet here:

'abe' => 'wôbanakiôdwawôgan', // T150633
'din' => 'dinka',           // T75563
'kea' => 'Kabuverdianu',    // T127435
'nod' => 'ᨣᩴᩤᨾᩮᩥᩬᨦ',            // T93880
'ota' => 'لسان توركى',      // T59342
'rwr' => 'मारवाड़ी',           // T61905
'sje' => 'bidumsámegiella', // T146707
'smj' => 'julevsámegiella', // T146707

As you can see this does not add the English names of these languages, but the name in the language itself.

Nikki added a comment.Nov 24 2016, 8:26 PM

I had a look at that tracker and found http://unicode.org/cldr/trac/ticket/9137 where two of the codes here (fkv and sje) were already requested but rejected. Given the following comment, it seems like it would be a waste of time to request the addition of more languages there:

We agreed to document there is no intent for CLDR to have the English names of all languages (there are over 7,000) of them, and point to ​http://www.iana.org/assignments/language-subtag-registry/language-subtag-registry as a source for any extra ones that people need.

Nikki added a comment.Nov 24 2016, 8:56 PM

More detailed comments:

nl-informal and roa-tara are Wikimedia inventions, so they would definitely need to go into LocalNamesEn.php.

bxr and mo also seem to have been rejected a few years ago in http://unicode.org/cldr/trac/ticket/6763

All of the language-only codes I listed are in the subtag registry mentioned in the comment, but that only provides English names so adding support for that like they suggest might not be worth the effort (versus just adding the ones we need to LocalNamesEn.php).

The country variants (ko-kp, zh-mo, zh-my) could be generated from the language name we already have plus the country name from CLDR. There's also kk-cn, kk-kz, kk-tr, zh-cn, zh-hk, zh-sg, zh-tw (which are already in LocalNamesEn.php). That seems like a good idea, because then they would be automatically translated into lots of languages instead of most languages falling back to English or the native name.

Theoretically, the script variants (ady-cyrl, aeb-arab, aeb-latn, kbd-cyrl, ku-arab, shi-latn, shi-tfng, plus 34 others already in LocalNamesEn.php) could also be generated, but the script names in CLDR do not include the word "script". That's not ideal because some of the scripts share the same name as a language (e.g. ku-arab would become "Kurdish (Arabic)", the meaning of which is not very clear).

jhsoby added a subscriber: jhsoby.Nov 27 2016, 9:03 AM
Liuxinyu970226 added a comment.EditedDec 4 2016, 4:50 AM

Thanks for adding zh-mo, since there's some differences between Hong Kong and Macau words pointed on zhwiki.

I doubt if "Chinese (Malaysia)" (zh-my) is still useful, since there's unlikely having difference between this and Singaporean (zh-sg) (if someone could point that I will thank to them too), maybe it's not worth to drop zh-my? I have no enough time on it.

Nikki updated the task description. (Show Details)Dec 21 2016, 3:17 PM

@Lydia_Pintscher Is Shizhao's action above valid? The main topic of this task looks like about missing English names of WD language tags (which therefore this fits MediaWiki-extensions-CLDR ).

It doesn't hurt I guess :)

Nikki updated the task description. (Show Details)Feb 25 2017, 8:56 PM
abian added a subscriber: abian.Apr 12 2017, 2:50 PM