Page MenuHomePhabricator

Some language codes for sitelinks are capitalised differently from labels
Open, MediumPublicBUG REPORT

Description

For labels, the codes crh-latn and nds-nl are lowercase (see this query). For sitelinks, they are written crh-Latn and nds-NL (see this query).

Since language codes are case insensitive, they are both valid and equivalent, but I think we should be consistent, because that's what most users will expect (e.g. not one of the example queries use lcase() or ucase() when using lang() or schema:inLanguage - people just assume that a language code will always be capitalised the same way) and being inconsistent leads to unexpected behaviour (unexpected results when comparing labels with sitelinks is how I discovered that the capitalisation differs). I think only crh-latn and nds-nl are affected for sitelinks, so far less data would be affected by lowercasing those compared to changing the case used for labels/descriptions/aliases.


Review June 21, 2021: suggested update to Wikibase.default.php#199 (canonicalLanguageCodes) (currently, there is "crh-Latn"):

		'crh'      => 'crh-latn',
		'nds-NL'      => 'nds-nl',

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Generally, these are two different language codes - one from label, another from site, which is probably takes from wiki configuration, so they are just what the data says. However, we do normalize the language for sitelinks:

$lang = $this->vocabulary->getCanonicalLanguageCode( $site->getLanguageCode() );

but not for terms (labels, descriptions, etc.). Maybe we should start doing that too? I'm not sure what is the accepted practice for those is.

The standard is quite clear on the recommended formatting, and also that language codes are case insensitive.

Internally MediaWiki uses lowercase language codes, and unfortunately those are exposed in user visible places such as when used as subpage for mediawiki namespace messages or translatable pages. This makes it hard to reach consistency.

For example when Babel categories were changed to the recommended formatting, it wasn't without controversy.

Smalyshev triaged this task as Medium priority.Dec 21 2017, 2:12 AM
Smalyshev added a project: I18n.

Can we fix the capitalization on schema:inLanguage ? These should be lowercase as everything else. Where are these defined?

Esc3300 set Due Date to Jun 28 2021, 12:00 AM.Jun 21 2021, 10:09 AM
Esc3300 updated the task description. (Show Details)
Esc3300 added a subscriber: Lucas_Werkmeister_WMDE.

@Lucas_Werkmeister_WMDE are there some aspects you think are still missing? Or alternate solutions?

@Esc3300: Hi, I ask you again not to set Due Dates for no reason, plus not to move things on workboards but ask if things are unclear. Thanks.

@Aklapper The due date would have been for the (trivial) patch.

Apparently some things are still unclear after four years, accordingly I moved it to "needs discussion or investigation".

Maybe Lucas has some input on what's missing. @Lydia_Pintscher would you like to add something too?

@Esc3300: Patches themselves have no due dates - you (or someone else) write them when you have time, or you don't. Please avoid setting random Due Dates. Thanks.

@Aklapper following T284276 we try to have them available within a week. (A patch here being one or two added or changed lines in a configuration list)

I have no comment on this task, beyond the fact that it’s neither for @Esc3300 nor me to decide which column no the Wikidata board this task belongs on, or what the due date should be. Please refrain from moving tasks on Wikidata boards.

@Lucas_Werkmeister_WMDE If you have no view on this, why did you remove the patch-welcome tag ?

I try to keep my activity consistent with what was discussed in T284276

@Lucas_Werkmeister_WMDE If you have no view on this, why did you remove the patch-welcome tag ?

Because you were wrong to add it. There's been no decision on how to implement it, therefore it's not ready for someone to submit a patch.

I try to keep my activity consistent with what was discussed in T284276

That ticket is still open and @Amire80 already asked you in T284276#7160239 to stop acting like it's the new rule.

Also, this is not a language code addition, so that ticket is irrelevant here.

Because you were wrong to add it. There's been no decision on how to implement it, therefore it's not ready for someone to submit a patch.

Developers are free to provide patches .. there is no requirement for you to decide on it

Also, this is not a language code addition, so that ticket is irrelevant here.

It's language code related and we haven't really seen a constructive alternate proposal to meet the general objective of the process.

@Esc3300: This ticket is already "triaged".

Status maybe, but one should determine in which channel it should go:

  • update a simple configuration setting
  • completely redesign Wikibase

Even if the second solution is preferred, in the months or years before, the first could be implemented in the meantime.

Esc3300 changed the subtype of this task from "Task" to "Bug Report".Jul 20 2021, 8:15 AM