Page MenuHomePhabricator

Remove wmgExtraLanguageNames from Wikimedia production
Open, Needs TriagePublic

Description

Time was, extra language codes for Wikidata labels were added by adding them to the wmgExtraLanguageNames in the wmf-config/InitialiseSettings.php (for the wikidata dblist and later also commonswiki). This was bad for several reasons, and so we eventually moved that configuration to Wikibase (T260118: Move content of $wgExtraLanguageNames on Wikidata to default Terms languages). However, soon afterwards we added the wmgExtraLanguageNames back to the production config (T264295: Reinstate $wgExtraLanguageCodes in production). We would like to remove it again to avoid confusion and issues – for instance, T272242: Language code "dag" for Dagbani does not work for lexemes happened because the language code 'dag' was added to the Wikibase list but not the wmgExtraLanguageNames.

Removing the production wmgExtraLanguageNames is blocked on at least two tasks:

See also: T277836: Recent additions to term languages have not been added to InitialiseSettings.php

Related Objects

Event Timeline

Languages which aren't in wmgExtraLanguageNames also don't sort properly on Special:NewItem (see e.g. T272346)

Change 734722 had a related patch set uploaded (by Mbch331; author: Mbch331):

[operations/mediawiki-config@master] Add missing termbox codes from Wikibase

https://gerrit.wikimedia.org/r/734722

Some other places which appear to depend on wmgExtraLanguageNames:

The language function - {{#language:nan-hani}} used to display nan-Hani before it was added to wmgExtraLanguageNames, it now displays 閩南語.

wbcontentlanguages in the API - it used to have null for the autonym of nan-hani, it now has 閩南語.

The ULS language selector - adding ?uselang=nan-hani to the URL used to cause the icon (next to the username) to be shown without any language name, it now displays 閩南語.

Babel - adding nan-hani-0 to a babel box used to show Template:nan-hani-0, it now shows a normal babel box with 閩南語 as the language name.

Special:Translate - it didn't used to accept nan-hani, it now does and displays 閩南語 as the tooltip for the language name.

The termbox - adding ?uselang=nan-hani to the URL used to show the language name in English, it now shows 閩南語.

I’m not sure what this is doing in the review column, or what we’re supposed to review…

I hoped that the "Patch-For-Review" tag meant that there is a solution to review. If this is not the case.. what would be the next step here?

The Gerrit patch mentioned above was merged, but had been reassigned to T277836 in the meantime, so Gerritbot didn’t leave a comment here.

hoo added a subscriber: hoo.

Moving this back to incoming (as this is not worked on right now)… not sure we even want this in the hearth currently.

Hi @Mahir256, you asked about this task in todays Wikidata office hour: This task is currently not a priority for me on its own. Instead, I plan to work on it in the context of improving the general process around new language codes (see T297350). Is there a specific reason why you brought it up? Cheers!

Another place which appears to depend on wmgExtraLanguageNames:

meta=languageinfo in the API. In https://www.wikidata.org/w/api.php?action=query&meta=languageinfo&liprop=autonym&formatversion=2, languages in wmgExtraLanguageNames (e.g. nan-hani) have autonyms, those which are only defined in Wikibase (e.g. fr-ca) don't.

@Michael pointed out one thing $wmgExtraLanguageNames doesn’t provide: language fallbacks. For example, nan-hani on Wikidata only falls back to en, not to nan (or cdo, zh-hant, zh, zh-hans). IMHO this might be worth tackling as part of this task (e.g., if we end up adding some place in the Wikibase config where autonyms of non-MediaWiki languages can be defined, perhaps that place should also support defining language fallbacks for them).

Interesting, Lucas and Michael! How does T341409 relate to this?

It would probably make sense to fix this task before we do T341409: [TECH] Use LanguageNameUtils::ALL for monolingual text and lexemes, otherwise all the new language codes added there will have the various inconsistencies that come with not being in wmgExtraLanguageNames.

@Michael pointed out one thing $wmgExtraLanguageNames doesn’t provide: language fallbacks. For example, nan-hani on Wikidata only falls back to en, not to nan (or cdo, zh-hant, zh, zh-hans). IMHO this might be worth tackling as part of this task (e.g., if we end up adding some place in the Wikibase config where autonyms of non-MediaWiki languages can be defined, perhaps that place should also support defining language fallbacks for them).

The (probably long term) solution is to complete the step 7 of T190129: Consolidate language metadata into a 'language-data' library and use in MediaWiki.