Many language wiki templates (pl, it, en, cs) don't accept xx-XX style language codes
Closed, DeclinedPublic0 Estimated Story Points
Actions

Assigned To

None

Authored By

	• Elitre
	Oct 13 2015, 2:29 PM

Description

IT wiki:
You can see it here.
(I also put the same citations at en.wiki in case it's useful or interesting for you to see the different outcome.)
A solution to a similar issue is discussed in https://phabricator.wikimedia.org/T97256#1248815 .

PL wiki:
https://www.mediawiki.org/w/index.php?title=Topic:Sgikbv81nxsv7oy6&topic_showPostId=sqt0thxnx0gx69xc#flow-post-sqt0thxnx0gx69xc

EN wiki:
VE is setting the language in {{cite}} templates. It's setting it to en-US, en-GB and other flavours which are not recognized by {{cite}}. Also, these shouldn't be set if the language and wiki are the same languages. On enwiki, these errors end up in [[Category:CS1 maint: Unrecognized language]]

CS wiki:
Per community discussion here, spotted also with another problem described in T156548

Examples are:
https://en.wikipedia.org/w/index.php?title=Chris_Harris_(Automotive_Journalist)&action=edit&oldid=688324259
https://en.wikipedia.org/w/index.php?title=Adam_Waito&type=revision&diff=685374219&oldid=685373324
https://en.wikipedia.org/w/index.php?title=Aijia&type=revision&diff=681819929&oldid=681818020

Related Objects
Search...

		Status	Subtype	Assigned	Task
		Declined		None	T115326 Many language wiki templates (pl, it, en, cs) don't accept xx-XX style language codes
		Open		None	T93561 Improve language code validation

Event Timeline

• Elitre created this task.Oct 13 2015, 2:29 PM

• Elitre raised the priority of this task from to Needs Triage.

• Elitre updated the task description. (Show Details)

• Elitre added a project: Citoid.

• Elitre subscribed.

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptOct 13 2015, 2:29 PM

• Elitre updated the task description. (Show Details)Oct 13 2015, 2:29 PM

• Elitre set Security to None.

• Elitre added a subscriber: Mvolz.

Mvolz added a subtask: T93561: Improve language code validation.Oct 14 2015, 7:22 PM

Mvolz merged a task: T117305: Unrecognized language link.Nov 3 2015, 2:13 PM

Mvolz added subscribers: Stryn, Krenair, NicoV and 2 others.

So, this is basically the result of us now scraping more data. Our language validator has always allowed xx-XX style language codes, we just weren't getting them as often so it wasn't as noticeable.

We're currently not entirely certain how to resolve this; each template has its own way of validating language codes, and we don't want to overfit to a particular template. We'd like to conform to a given standard but are not sure what that would be. We're currently basically using https://en.wikipedia.org/wiki/IETF_language_tag (as noted by @mobrovac in chat) but not very strictly.

Mvolz renamed this task from "Unknown language" error on it.wp for sources in Italian to Many language wiki templates (pl, it, en) don't accept xx-XX style language codes.Nov 3 2015, 2:49 PM

Mvolz updated the task description. (Show Details)

@Mvolz, so who can help you to move this forward, anything I can do here? Is "not scraping more data until a fix is found" a possible solution? Any advice we can give to communities to "fix" this on their side if possible, other than the workaround linked above? Thank you!

Restricted Application added a subscriber: StudiesWorld. · View Herald TranscriptNov 5 2015, 6:51 PM

Mvolz merged a task: T118045: Citoid picks up a 'pl-pl' language for a page.Nov 30 2015, 9:01 AM

Mvolz added a subscriber: saper.

@Elitre, re moving things forward- I think we are basically still undecided on what to do.

A possible fix, which is safe probably for most templates, is to stick to two-three letter language codes, but there have been complaints about the existing language codes being too limiting. But that's something I'm willing to do- @mobrovac?

Mvolz claimed this task.Nov 30 2015, 9:03 AM

Mvolz triaged this task as Medium priority.

In T115326#1837712, @Mvolz wrote:

A possible fix, which is safe probably for most templates, is to stick to two-three letter language codes, but there have been complaints about the existing language codes being too limiting.

Let's pick a standard and enforce it?

I think we agree on that, just which standard, is the issue.

nl.wp doesn't recognize xx-XX languages codes like nl-NL, the language templates accept only two letter codes (sometimes three letter codes)

Mvolz moved this task from Backlog to IO Tasks on the Citoid board.Jan 12 2016, 10:14 AM

Mvolz removed Mvolz as the assignee of this task.Sep 30 2016, 2:39 PM

Restricted Application added a project: VisualEditor. · View Herald TranscriptSep 30 2016, 2:39 PM

So, the options are:

Ignore this.
Modify the citoid service to send less information, except magically when it's wanted (like pt-BR vs. pt); no idea how we'd all agree on a shared set for everyone.
The above, but inside the Citoid extension, so all clients of the service would have to replicate the same logic (but more flexible to adjust on a per-wiki basis).
Fix the templates to work with these valid codes.

Or am I missing something? Option 4 seems the obvious winner…

On fiwiki we have {{IETF-kielisymboli}} that converts "en-EN" to show like "en" would show. And if the site language is also Finnish, then this won't show it. This is using the codeLangue3 function in Module:FrLangue.

German WP has no problems at all with any language code.

If we are told explicitly that a book is written in German we store this information, but don't show that in articles and do not bother readers, but expose it in microformats.

May I advertise Multilingual lua library, e,g, getBase function? It falls back to root language for those who cannot deal with extended codes right now. Later it may be configured to support variants unknown to CLDR. Publications written in multiple languages are supported, too.

Soum213 mentioned this in T148320: Documenting process of writing Zotero translators through translation-servers.Oct 17 2016, 7:26 AM

Mvolz moved this task from IO Tasks to Zotero on the Citoid board.Oct 28 2016, 3:24 PM

Liuxinyu970226 subscribed.Dec 19 2016, 7:48 AM

Mvolz renamed this task from Many language wiki templates (pl, it, en) don't accept xx-XX style language codes to Many language wiki templates (pl, it, en, cs) don't accept xx-XX style language codes.Jan 28 2017, 1:12 PM

Mvolz merged a task: T156547: Allow short language shortcuts in citation templates on cswiki.

Mvolz added a subscriber: Dvorapa.

Dvorapa updated the task description. (Show Details)Jan 28 2017, 1:36 PM

Really low priority?

JAnD subscribed.Jan 30 2017, 6:38 AM

Jdforrester-WMF set the point value for this task to 0.Feb 9 2017, 6:14 PM

Mannivu subscribed.Apr 10 2017, 1:15 PM

Phil_Boswell subscribed.May 20 2017, 10:00 AM

Any progress on this task?

Czech Wikipedia users complained again (details)

This can be fixed, community-side, by editing the citation templates. There are a few references to this in this ticket and in at least another one. You can contact the people who left such comments if you need further details.

@Elitre There is one problem. Czech Wikipedia community follows standard ISO 639-1, which only accepts two letter language definition (used also in Wikipedia subdomains)

Text sequences, in www and HTML, are tagged by codes according to RFC 5646 – Tags for Identifying Languages
There Primary Language Subtag declares “Three-character primary language subtags in the IANA registry were defined” etc.
IANA Language Subtag Registry knows more than 8100 three-letter-codes (seek for Subtag: aaa) since 2009.
HTML refers to a certain “BCP 47” which is nothing else than obsoleted RFC 5646 with the same story on three-letter-codes.
HTML is the ultimative specification on resolving wikitext at client side.

Conclusio: Any limitation to two-letter-codes is not appropriate and needs to be extended.

https://ace.wikipedia.org/ is the first Wikipedia subdomain of many others with three letters. The two-letter-code-story is nonsense today, had a certain importance a decade ago.

Dvorapa mentioned this in T156548: Disallow adding Czech language tag (cs-CZ) into citation templates on cswiki.Feb 8 2018, 9:36 PM

The English Wikipedia CS1/2 modules currently support by default what MediaWiki supports (mw.language.fetchLanguageNames). The module will trim to the first 2/3 letter code in language_parameter in the module proper. You can see this by experimenting with an English Wikipedia page that it does accept e.g. nl-NL with an output of "(in Dutch)". There are some overrides listed in lang_name_remap in the configuration file, but that's not directly relevant here.

I generally agree that the correct fix is for the communities to get their modules up to date with the English modules if they are seeing errors for longer codes. (There may be other issues with that of course that come with what are likely severely out-of-date modules. [No, not a problem fixed by global modules--we'd just kill all development of the English modules that way as everyone would need to agree that certain parameters were deprecated or not deprecated or.......])

I would recommend that this task be declined entirely or at best an issue for the CommTech team to handle, and for wikis still affected to come to the English Wikipedia talk page. @Trappist_the_monk is very helpful with use-of-CS1/2-on-other-wikis kinds of questions/issues.

What the module does not do at this time is display that this is Dutch Dutch (nl-NL) (or, er, Dutch--better example is en-GB = British English). Is that what is being requested? There are some TODO comments in the module code if this is what is being requested. There's probably some work that could be done to hook into Module:Lang which would support this better. However, I think that request is a different task.

The description includes Also, these shouldn't be set if the language and wiki are the same languages. This is not true any longer (see end comments in the linked T156548). English Wikipedia at least will take the value but do nothing with it rather than dump it into a maintenance category. This is done automatically without any need for configuration.

In T115326#4826561, @Izno wrote:

The English Wikipedia CS1/2 modules currently support by default what MediaWiki supports (mw.language.fetchLanguageNames). The module will trim to the first 2/3 letter code in language_parameter in the module proper. You can see this by experimenting with an English Wikipedia page that it does accept e.g. nl-NL with an output of "(in Dutch)". There are some overrides listed in lang_name_remap in the configuration file, but that's not directly relevant here.

I generally agree that the correct fix is for the communities to get their modules up to date with the English modules if they are seeing errors for longer codes. (There may be other issues with that of course that come with what are likely severely out-of-date modules. [No, not a problem fixed by global modules--we'd just kill all development of the English modules that way as everyone would need to agree that certain parameters were deprecated or not deprecated or.......])

I would recommend that this task be declined entirely or at best an issue for the CommTech team to handle, and for wikis still affected to come to the English Wikipedia talk page. @Trappist_the_monk is very helpful with use-of-CS1/2-on-other-wikis kinds of questions/issues.

What the module does not do at this time is display that this is Dutch Dutch (nl-NL) (or, er, Dutch--better example is en-GB = British English). Is that what is being requested? There are some TODO comments in the module code if this is what is being requested. There's probably some work that could be done to hook into Module:Lang which would support this better. However, I think that request is a different task.

My understanding was that that actually *isn't* wanted...

The description includes Also, these shouldn't be set if the language and wiki are the same languages. This is not true any longer (see end comments in the linked T156548). English Wikipedia at least will take the value but do nothing with it rather than dump it into a maintenance category. This is done automatically without any need for configuration.

Yes, my understanding is that is the preferred option. This ticket is so old that it's been fixed since on wiki. I agree that we should decline it since it seems to be a problem that, if we let sit long enough, encourages better practice by using more modern language codes as per @PerfektesChaos ;).

Mvolz closed this task as Declined.Dec 16 2018, 8:14 PM

Restricted Application removed a subscriber: Liuxinyu970226. · View Herald TranscriptDec 16 2018, 8:14 PM

Dvorapa awarded a token.Dec 16 2018, 9:48 PM

Izno mentioned this in T212604: Citoid should not add the language parameter to the cite template when the language matches the content language.Dec 25 2018, 3:09 PM

Many language wiki templates (pl, it, en, cs) don't accept xx-XX style language codesClosed, DeclinedPublic0 Estimated Story PointsActions

Description

Related ObjectsSearch...

Event Timeline

Many language wiki templates (pl, it, en, cs) don't accept xx-XX style language codes
Closed, DeclinedPublic0 Estimated Story Points
Actions

Related Objects
Search...