Page MenuHomePhabricator

apparently inconsistent returns from language magic word
Open, Needs TriagePublic

Description

{{#language:sq}} → shqip – target language tag not specified; returns autonym (expected result)
{{#language:sq|sq}} → shqip – target language tag specified as Albanian; returns Albanian autonym (expected result)
{{#language:sq|sq-}} → Albanian – malformed target language tag specified; returns Albanian language name in English (unexpected result)
{{#language:sq|sq-L}} → Albanian – malformed target language tag specified; returns Albanian language name in English (unexpected result)
{{#language:sq|s}} → shqip – malformed target language tag specified; returns Albanian autonym (expected result)

Event Timeline

Related perhaps but certainly not a duplicate of T244787 (which problem still exists)

Not an issue confined to en.wiki. These were written at sq.wiki and produce similar results both there and at zh.wiki to the originals written at en.wiki:

  1. {{#language:zh}} → 中文 – target language tag not specified; returns autonym (expected result)
  2. {{#language:zh|zh}} → 中文 – target language tag specified as Chinese; returns Chinese language name in Chinese (expected result)
  3. {{#language:zh|zh-}} → Chinese – malformed target language tag specified; returns Chinese language name in English (unexpected result)
  4. {{#language:zh|zh-L}} → Chinese – malformed target language tag specified; returns Chinese language name in English (unexpected result)
  5. {{#language:zh|z}} → 中文 – malformed target language tag specified; returns Chinese autonym (expected result)

The last three examples of this group and of the originals have bogus target language tags. These three produce results that are inconsistent when one would expect a consistent response to bogus target language tags. The distinction is apparently due to the length and content of the target language tag. A single letter, digit, or hyphen causes {{#language}} to return the autonym; two or more letters, digits, or hyphens causes {{#language}} to return the language name in English; punctuation characters, regardless of quantity, (I've tested *, . and !) cause {{#language}} to return the autonym; a mix of letters, digits, hyphens, and punctuation causes {{#language}} to return the autonym.

Given the nature of IETF language tags which are sometimes used here, examples 3 and 4 above could be stripped back to the ISO 639-1 language tag in which case {{#language}} could return the language name written in that language (the autonym in these examples) but that doesn't happen.

For me, I think that {{#language}} should return the autonym when given a bogus target language. At the very least, the return should be consistent.

Nikerabbit subscribed.

z is not a valid built-in code (no single letter is), so it doesn't get any fallback. As opposed to zh- which falls back to English.

I think a good solution would be if this code validated the language codes with isKnownLanguageTag:

CoreParserFunctions.php
	public static function language( $parser, $code = '', $inLanguage = '' ) {
		$code = strtolower( $code );
		$inLanguage = strtolower( $inLanguage );
		$lang = MediaWikiServices::getInstance()
			->getLanguageNameUtils()
			->getLanguageName( $code, $inLanguage );

		return $lang !== '' ? $lang : LanguageCode::bcp47( $code );
	}

As far as it goes, that seems sensible. What you have omitted is to say what happens when the language tags are not known language tags. What then?

Yes z is not a valid language tag. Nor are zh- and zh-L valid language tags. These latter two can be trimmed to be valid simply by removing the hyphen and anything that follows it. By trimming the target language tag back to its base, {{#language}} can return an approximation of the desired value:

{{#language:sq|zh-hatn}}{{#language:sq|zh}} → 阿尔巴尼亚语 – an approximation of what the writer wanted.

Or, perhaps, don't trim but instead discard unrecognized target language tags and return the autonym (if there is one); else fallback to the designated fallback:
{{#language:sq|zhhatn}}{{#language:sq}} → shqip

Another alternative: when either the language code or the target language are not recognized as valid, return the magic word as a text string:
{{#language:sq|zhhatn}} → {{#language:sq|zhhatn}}

Whatever is done, it should give the user some indication that something is wrong and be consistent about it. Writers see what they expect to have written even when their hands did not actually write what it is that the writer wanted to write.