Page MenuHomePhabricator

Some terms are transliterated in Bengali instead of Meetei Mayek in Wikidata.
Open, Needs TriagePublicBUG REPORT

Description

Steps to replicate the issue (include links if applicable):

What happens?:
Some terms are transliterated in the Bengali script instead of the Meetei Mayek script in Wikidata. (ইংলিস)

What should have happened instead?:
ꯏꯪꯂꯤꯁ

Software version (skip for WMF-hosted wikis like Wikipedia):

Other information (browser name/version, screenshots, etc.):

Screenshot_2022-07-27-11-03-41-847_com.android.chrome.jpg (2×1 px, 327 KB)

Event Timeline

de' => 'ꯖꯔꯃꯟ', 'de-at' => 'ꯑꯁꯇ꯭ꯔ꯭ꯤꯌꯥꯟ ꯖꯔꯃꯟ', 'de-ch' => 'ꯁ꯭ꯋꯤꯁ ꯍꯥꯏ ꯖꯔꯃꯟ', 'en' => 'ꯏꯪꯂꯤꯁ', 'en-au' => 'ꯑꯁꯇ꯭ꯔꯦꯂꯤꯌꯥꯟ ꯏꯪꯂꯤꯁ', 'en-ca' => 'ꯀꯅꯥꯗꯤꯌꯥꯟ ꯏꯪꯂꯤꯁ', 'en-gb' => 'ꯕ꯭ꯔꯤꯇꯤꯁ ꯏꯪꯂꯤꯁ', 'en-us' => 'ꯑꯃꯦꯔꯤꯀꯥꯟ ꯏꯪꯂꯤꯁ', 'es' => 'ꯁ꯭ꯄꯦꯅꯤꯁ', 'es-419' => 'ꯂꯦꯇꯤꯟ ꯑꯃꯦꯔꯤꯀꯥꯟ ꯏꯪꯂꯤꯁ', 'es-es' => 'ꯌꯨꯔꯣꯄꯤꯌꯥꯟ ꯁ꯭ꯄꯦꯅꯤꯁ', 'es-mx' => 'ꯃꯦꯛꯁꯤꯀꯥꯟ ꯏꯪꯂꯤꯁ', 'fr' => 'ꯐ꯭ꯔꯥꯟꯆ', 'fr-ca' => 'ꯀꯅꯥꯗꯤꯌꯥꯟ ꯐ꯭ꯔꯥꯟꯆ', 'fr-ch' => 'ꯁ꯭ꯋꯤꯁ ꯐ꯭ꯔꯥꯟꯆ', 'it' => 'ꯏꯇꯥꯂꯤꯌꯥꯟ', 'ja' => 'ꯖꯄꯥꯅꯤꯁ', 'mni' => 'ꯃꯤꯇꯩ ꯂꯣꯟ', 'pt' => 'ꯄ꯭ꯔꯣꯇꯨꯒꯤꯁ', 'pt-br' => 'ꯕ꯭ꯔꯥꯓꯤꯂꯤꯌꯥꯟ ꯄ꯭ꯔꯣꯇꯨꯒꯤꯁ', 'pt-pt' => 'ꯌꯨꯔꯣꯄꯤꯌꯥꯟ ꯄ꯭ꯔꯣꯇꯨꯒꯤꯁ', 'ru' => 'ꯔꯁꯤꯌꯥꯟ', 'und' => 'ꯃꯁꯛꯈꯪꯗꯕ ꯂꯣꯟ', 'zh' => 'ꯆꯥꯏꯅꯤꯁ', 'zh-hans' => 'ꯑꯔꯥꯏꯕ ꯆꯥꯏꯅꯤꯁ', 'zh-hant' => 'ꯑꯔꯤꯕ ꯆꯥꯏꯅꯤꯁ',];

$currencyNames = [
'BRL' => 'ꯕ꯭ꯔꯥꯓꯤꯂꯤꯌꯥꯟ ꯔꯤꯌꯦꯜ',
'CNY' => 'ꯆꯥꯏꯅꯤꯁ ꯌꯨꯑꯥꯟ',
'EUR' => 'ꯌꯨꯔꯣ',
'GBP' => 'ꯕ꯭ꯔꯤꯇꯤꯁ ꯄꯥꯎꯟ',
'INR' => 'ꯏꯟꯗꯤꯌꯥꯟ ꯔꯨꯄꯦ',
'JPY' => 'ꯖꯄꯥꯅꯤꯁ ꯌꯦꯟ',
'RUB' => 'ꯔꯁꯤꯌꯥꯟ ꯔꯨꯕꯜ',
'USD' => 'ꯌꯨ ꯑꯦꯁ ꯗꯤ',
'XXX' => 'ꯃꯁꯛ ꯈꯪꯗꯕ ꯁꯦꯟꯌꯦꯛ',
];

$countryNames = [
'BR' => 'ꯕ꯭ꯔꯥꯓꯤꯜ',
'CN' => 'ꯆꯥꯏꯅꯥ',
'DE' => 'ꯖꯔꯃꯅꯤ',
'FR' => 'ꯐ꯭ꯔꯥꯟꯆ',
'GB' => 'ꯌꯨꯅꯥꯏꯇꯦꯗ ꯀꯤꯡꯗꯝ',
'IN' => 'ꯏꯟꯗꯤꯌꯥ',
'IT' => 'ꯏꯇꯥꯂꯤ',
'JP' => 'ꯖꯄꯥꯟ',
'RU' => 'ꯔꯁꯤꯌꯥ',
'US' => 'ꯌꯨꯅꯥꯏꯇꯦꯗ ꯁ꯭ꯇꯦꯠꯁ',
];

Can someone help me understand what the problem is? Is it somethign the Wikidata team needs to fix or is the problem elsewhere?

Can someone help me understand what the problem is? Is it somethign the Wikidata team needs to fix or is the problem elsewhere?

The problem is in the CLDR extension.

Wikidata shows translated language names and those translations come from the CLDR extension, which provides data from CLDR. In MediaWiki, the default script for mni is Meetei Mayek, but in CLDR, the default script is Bengali. That means that when Wikidata displays a language name in mni, the name it gets from the CLDR extension is in the wrong script.

The fix is to add the translations in the right script (which was provided above) to LocalNamesMni.php in the CLDR extension to override the CLDR data.

I think the following changes in rebuild.php (GitHub) would make the CLDR extension use CLDR's mni-mtei for MediaWiki's mni:

After line 142:

} elseif ( $code === 'mni' ) {
	$realCode = 'mni-beng';
} elseif ( $code === 'mni-mtei' ) {
	$realCode = 'mni';

It might also need the following after line 62:

$languages['mni-beng'] = 'Foo';

I think those changes will make it create files for mni-beng too, but T357853 asks for both scripts to be supported, so that sounds like something we would want anyway.

CLDR's mni-mtei doesn't contain many names though, so we would probably still need to create LocalNamesMni.php with the list from the first comment. I've put a properly formatted version in P61865.

(Also: I think getRealCode() would be a lot clearer if $code were changed to $cldrCode and $realCode to $mwCode. Neither code is less "real" (if anything, the MediaWiki codes would be, since MediaWiki has some non-standard codes), and without reading the description, it's not clear whether the function is turning the CLDR code into the corresponding MediaWiki code or vice versa)

Change #1027083 had a related patch set uploaded (by Lucas Werkmeister (WMDE); author: Lucas Werkmeister (WMDE)):

[mediawiki/extensions/cldr@master] Fix script for mni language

https://gerrit.wikimedia.org/r/1027083

Change #1027084 had a related patch set uploaded (by Lucas Werkmeister (WMDE); author: Lucas Werkmeister (WMDE)):

[mediawiki/extensions/cldr@master] Add LocalNamesMni.php

https://gerrit.wikimedia.org/r/1027084