Page MenuHomePhabricator

Harmonize language codes and autonyms between language-data and Wikimedia wmgExtraLanguageNames
Closed, ResolvedPublic

Description

Some language codes appear with autonyms in wmgExtraLanguageNames Wikimedia's InitialiseSettings.php.

They should be in harmony with the definitions in language-data on GitHub and Names.php.

Here's what I see at the moment (they all appear twice, for wikidata and for commons):

  • 'dag' => 'Dagbanli', // T260037, T272242 - Appears in Names.php and can be removed. For InitialiseSettings.php patch, see T283168.
  • 'fkv' => 'kvääni', // T167259 - Equal to language-data and doesn't appear in Names.php, probably no action needed
  • 'kea' => 'Kabuverdianu', // T127435 - Equal to language-data and doesn't appear in Names.php, but the capitalization should be fixed in both of them. language-data patch, InitialiseSettings.php WIP patch.
  • 'nod' => 'ᨣᩴᩤᨾᩮᩥᩬᨦ', // T93880 - The autonym in wmgExtraLanguageNames is in the Tai Tham script, and the autonym in language-data is in the Thai script. Both are probably useful, but for clarification it would be great to discuss this with a speaker. Discussed, I'll add nod-thai in a separate task.
  • 'ota' => 'لسان توركى', // T59342 - This says "Turkish language" and language-data says "لسان عثمانى", which is "Ottoman language". They should be the same, probably "لسان عثمانى". InitialiseSettings.php WIP patch.
  • 'rmf' => 'kaalengo tšimb', // T226701 - Equal to language-data and doesn't appear in Names.php, probably no action needed
  • 'rwr' => 'मारवाड़ी', // T61905 - Equal to language-data and doesn't appear in Names.php, probably no action needed
  • 'sjd' => 'Кӣллт са̄мь кӣлл', // T226701 - Capitalized in wmgExtraLanguageNames, but not in language-data. One of them should be changed. My guess is that it should not be capitalized, but I'll be fine with whatever @Yupik, @Susannaanas, or @jhsoby say. InitialiseSettings.php WIP patch.
  • 'sje' => 'bidumsámegiella', // T146707 - Equal to language-data and doesn't appear in Names.php, probably no action needed
  • 'sju' => 'ubmejesámiengiälla', // T226701 - Equal to language-data and doesn't appear in Names.php, probably no action needed
  • 'smj' => 'julevsámegiella', // T146707 - Equal to language-data and doesn't appear in Names.php, probably no action needed
  • 'sms' => 'nuõrttsääʹmǩiõll', // T220118, T223544 - Equal to language-data and doesn't appear in Names.php, probably no action needed
  • 'srq' => 'mbia cheë', // T113408 - Doesn't appear in language-data, should be added. language-data patch.

See also T190129: Consolidate language metadata into a 'language-data' library and use in MediaWiki for a long-term solution

Event Timeline

I noticed that dag is already present in Names.php. None of the others are there (which should be expected, given the comment in InitializeSettings.php: "Some languages aren't currently supported by MediaWiki but available to encode information on Wikidata."). I suppose that means that dag can actually be removed from InitializeSettings.php.

Edit: oops, I somehow missed Nikki's comment above mine, which amounts to the same thing! 😅

By the way, the label for kea ("Kabuverdianu") is the same in InitializeSettings.php and in language-data, but it should actually be in lowercase. Not sure how it got to language-data as capitalized, since CLDR does have it correctly capitalized.

(It might be worth investigating if other languages in language-data have had the same issue.)

The contributors who wrote the article on sjd in the incubator have used lowercase for the other eastern Saami languages (а̄нар са̄мь кӣлл (smn), колтта са̄мь кӣлл (sms), ахькэль са̄мь кӣлл (sia) я та̄ррьй са̄мь кӣлл (sjt)) and based on this too, I would say it should be lowercase, but I'll ask around if you would like.

The contributors who wrote the article on sjd in the incubator have used lowercase for the other eastern Saami languages (а̄нар са̄мь кӣлл (smn), колтта са̄мь кӣлл (sms), ахькэль са̄мь кӣлл (sia) я та̄ррьй са̄мь кӣлл (sjt)) and based on this too, I would say it should be lowercase, but I'll ask around if you would like.

I think it's good enough, thanks!

Amire80 added a project: I18n.
Amire80 updated the task description. (Show Details)

Change 699692 had a related patch set uploaded (by Amire80; author: Amire80):

[operations/mediawiki-config@master] [WIP] Update autonyms for kea, ota, sjd in wmgExtraLanguageNames

https://gerrit.wikimedia.org/r/699692

Change 699692 merged by jenkins-bot:

[operations/mediawiki-config@master] Update autonyms in wmgExtraLanguageNames

https://gerrit.wikimedia.org/r/699692

Mentioned in SAL (#wikimedia-operations) [2021-11-08T12:19:36Z] <lucaswerkmeister-wmde@deploy1002> Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:699692|Update autonyms in wmgExtraLanguageNames (T284870)]] (duration: 00m 56s)

Amire80 updated the task description. (Show Details)