Page MenuHomePhabricator

Remove wmgExtraLanguageNames from Wikimedia production
Open, Needs TriagePublic

Description

Time was, extra language codes for Wikidata labels were added by adding them to the wmgExtraLanguageNames in the wmf-config/InitialiseSettings.php (for the wikidata dblist and later also commonswiki). This was bad for several reasons, and so we eventually moved that configuration to Wikibase (T260118: Move content of $wgExtraLanguageNames on Wikidata to default Terms languages). However, soon afterwards we added the wmgExtraLanguageNames back to the production config (T264295: Reinstate $wgExtraLanguageCodes in production). We would like to remove it again to avoid confusion and issues – for instance, T272242: Language code "dag" for Dagbani does not work for lexemes happened because the language code 'dag' was added to the Wikibase list but not the wmgExtraLanguageNames.

Removing the production wmgExtraLanguageNames is blocked on at least two tasks:

See also: T277836: Recent additions to term languages have not been added to InitialiseSettings.php

Related Objects

Event Timeline

Languages which aren't in wmgExtraLanguageNames also don't sort properly on Special:NewItem (see e.g. T272346)

Change 734722 had a related patch set uploaded (by Mbch331; author: Mbch331):

[operations/mediawiki-config@master] Add missing termbox codes from Wikibase

https://gerrit.wikimedia.org/r/734722

Some other places which appear to depend on wmgExtraLanguageNames:

The language function - {{#language:nan-hani}} used to display nan-Hani before it was added to wmgExtraLanguageNames, it now displays 閩南語.

wbcontentlanguages in the API - it used to have null for the autonym of nan-hani, it now has 閩南語.

The ULS language selector - adding ?uselang=nan-hani to the URL used to cause the icon (next to the username) to be shown without any language name, it now displays 閩南語.

Babel - adding nan-hani-0 to a babel box used to show Template:nan-hani-0, it now shows a normal babel box with 閩南語 as the language name.

Special:Translate - it didn't used to accept nan-hani, it now does and displays 閩南語 as the tooltip for the language name.

The termbox - adding ?uselang=nan-hani to the URL used to show the language name in English, it now shows 閩南語.

I’m not sure what this is doing in the review column, or what we’re supposed to review…

I hoped that the "Patch-For-Review" tag meant that there is a solution to review. If this is not the case.. what would be the next step here?

The Gerrit patch mentioned above was merged, but had been reassigned to T277836 in the meantime, so Gerritbot didn’t leave a comment here.

hoo subscribed.

Moving this back to incoming (as this is not worked on right now)… not sure we even want this in the hearth currently.

Hi @Mahir256, you asked about this task in todays Wikidata office hour: This task is currently not a priority for me on its own. Instead, I plan to work on it in the context of improving the general process around new language codes (see T297350). Is there a specific reason why you brought it up? Cheers!

Another place which appears to depend on wmgExtraLanguageNames:

meta=languageinfo in the API. In https://www.wikidata.org/w/api.php?action=query&meta=languageinfo&liprop=autonym&formatversion=2, languages in wmgExtraLanguageNames (e.g. nan-hani) have autonyms, those which are only defined in Wikibase (e.g. fr-ca) don't.

@Michael pointed out one thing $wmgExtraLanguageNames doesn’t provide: language fallbacks. For example, nan-hani on Wikidata only falls back to en, not to nan (or cdo, zh-hant, zh, zh-hans). IMHO this might be worth tackling as part of this task (e.g., if we end up adding some place in the Wikibase config where autonyms of non-MediaWiki languages can be defined, perhaps that place should also support defining language fallbacks for them).

Interesting, Lucas and Michael! How does T341409 relate to this?

It would probably make sense to fix this task before we do T341409: [TECH] Use LanguageNameUtils::ALL for monolingual text and lexemes, otherwise all the new language codes added there will have the various inconsistencies that come with not being in wmgExtraLanguageNames.

@Michael pointed out one thing $wmgExtraLanguageNames doesn’t provide: language fallbacks. For example, nan-hani on Wikidata only falls back to en, not to nan (or cdo, zh-hant, zh, zh-hans). IMHO this might be worth tackling as part of this task (e.g., if we end up adding some place in the Wikibase config where autonyms of non-MediaWiki languages can be defined, perhaps that place should also support defining language fallbacks for them).

The (probably long term) solution is to complete the step 7 of T190129: Consolidate language metadata into a 'language-data' library and use in MediaWiki.

I recently found out that the extension.json files support the key ExtraLanguageNames (see docs/config-schema.yaml).

The documentation isn't clear enough for me to understand how I'm supposed to use it, but after some trial and error it seems if I add the following to extension-repo.json, I get an entry in the list of languages for that value that I'm able to use to create items and lexemes and add monolingual text statements:

"config": {
	"ExtraLanguageNames": {
	"value": {
		"en-x-extjson": "English (from extension.json)"
	}
},

Would that perhaps be an option? As I understand it, that would essentially be adding it to $wgExtraLanguageNames from the extension itself instead of from Wikimedia's config. The description of this ticket says that using $wmgExtraLanguageNames "was bad for several reasons" but I'm not sure what those reasons actually are. The main thing I can think of is that adding them to Wikimedia's config doesn't make them available to third-party users, but if they're added in extension.json, they would be.

@Michael pointed out one thing $wmgExtraLanguageNames doesn’t provide: language fallbacks. For example, nan-hani on Wikidata only falls back to en, not to nan (or cdo, zh-hant, zh, zh-hans). IMHO this might be worth tackling as part of this task (e.g., if we end up adding some place in the Wikibase config where autonyms of non-MediaWiki languages can be defined, perhaps that place should also support defining language fallbacks for them).

Adding a language to Names.php doesn't provide language fallbacks either. Fallbacks (and direction) are defined separately in the languages/messages/MessagesXx.php files. I don't know why those MessagesXx.php files can't created in core if we've determined what the fallbacks/direction should be, but I discovered via includes/Hooks.php and messages/MessagesEs_419.php in the "LandingCheck" extension that it's possible for extensions to add their own MessagesXx.php files containing fallbacks/direction.

I have no idea what I'm doing but I seem to have got my local test Wikibase to add fallbacks and direction for a custom language code by:

creating repo/messages/MessagesEn_x_extjson.php with

<?php

$fallback = "yi, he, ar";
$rtl = true;

creating repo/includes/Hooks/GetMessagesFileNameHandler.php with

<?php

namespace Wikibase\Repo\Hooks;

class GetMessagesFileNameHandler {
	public static function onGetMessagesFileName( $code, &$file ) {
		$filename = dirname( __DIR__ ) . '/../messages/Messages' . str_replace( '-', '_', ucfirst( $code ) ) . '.php';
		if (is_readable( $filename )) {
			$file = $filename;
		}
	}
}

and then adding "Language::getMessagesFileName": "\\Wikibase\\Repo\\Hooks\\GetMessagesFileNameHandler::onGetMessagesFileName", to the "Hooks" section of extension-repo.json.