Page MenuHomePhabricator

Some language codes for sitelinks are capitalised differently from labels
Open, MediumPublic

Description

For labels, the codes crh-latn and nds-nl are lowercase (see this query). For sitelinks, they are written crh-Latn and nds-NL (see this query).

Since language codes are case insensitive, they are both valid and equivalent, but I think we should be consistent, because that's what most users will expect (e.g. not one of the example queries use lcase() or ucase() when using lang() or schema:inLanguage - people just assume that a language code will always be capitalised the same way) and being inconsistent leads to unexpected behaviour (unexpected results when comparing labels with sitelinks is how I discovered that the capitalisation differs). I think only crh-latn and nds-nl are affected for sitelinks, so far less data would be affected by lowercasing those compared to changing the case used for labels/descriptions/aliases.

Event Timeline

Nikki created this task.Jun 19 2017, 1:19 PM
Restricted Application added projects: Wikidata, Discovery. · View Herald TranscriptJun 19 2017, 1:19 PM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Generally, these are two different language codes - one from label, another from site, which is probably takes from wiki configuration, so they are just what the data says. However, we do normalize the language for sitelinks:

$lang = $this->vocabulary->getCanonicalLanguageCode( $site->getLanguageCode() );

but not for terms (labels, descriptions, etc.). Maybe we should start doing that too? I'm not sure what is the accepted practice for those is.

The standard is quite clear on the recommended formatting, and also that language codes are case insensitive.

Internally MediaWiki uses lowercase language codes, and unfortunately those are exposed in user visible places such as when used as subpage for mediawiki namespace messages or translatable pages. This makes it hard to reach consistency.

For example when Babel categories were changed to the recommended formatting, it wasn't without controversy.

Smalyshev triaged this task as Medium priority.Dec 21 2017, 2:12 AM
Smalyshev added a project: I18n.
Amire80 moved this task from Untriaged to Wikidata labels on the I18n board.Feb 3 2018, 11:24 AM