Page MenuHomePhabricator

Standardize invalid language codes for Babel extension
Closed, ResolvedPublic

Description

For example, there're both User zh-classical-N and User lzh-N, both are existed in Wikidata. This should not happen.

See also: T102533: [Bug] Disallow (or resolve) dummy language codes.

Event Timeline

Bugreporter updated the task description. (Show Details)
Bugreporter raised the priority of this task from to Needs Triage.
Bugreporter added a subscriber: Bugreporter.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJun 2 2015, 12:00 PM
Nikki added a subscriber: Nikki.Jul 5 2015, 5:13 AM
Liuxinyu970226 set Security to None.
Aklapper renamed this task from Standardize invaild language codes for Babel extension to Standardize invalid language codes for Babel extension.Jul 13 2015, 3:56 PM

BTW, can anyone please run a script to check all the incorrect babel templates and categories (i.e. Template:User als/Category:User als, Template:User zh-yue/Category:User zh-yue, etc. I'm afraid Wikidata missed some), and list them to a paste?

daniel moved this task from incoming to monitoring on the Wikidata board.Sep 10 2015, 11:57 AM
Restricted Application added a subscriber: StudiesWorld. · View Herald TranscriptNov 28 2015, 11:37 AM

I'm not sure if this is the right place to add this, but https://meta.wikimedia.org/wiki/User:Purodha has a list of languages that claim to be English. For example, the last is "wep-1", and the text says that the user has a basic knowledge of English, rather than a basic knowledge of the Westphalien language.

Nikki added a comment.Apr 21 2016, 7:23 AM

That sounds like a different issue, wep is already a valid language code.

I'm not sure if this is the right place to add this, but https://meta.wikimedia.org/wiki/User:Purodha has a list of languages that claim to be English. For example, the last is "wep-1", and the text says that the user has a basic knowledge of English, rather than a basic knowledge of the Westphalien language.

That sounds like a different issue, wep is already a valid language code.

Mnn, correct. For such "faux English" labels, the right way is to submit codes and names to CLDR, then let users on CLDR to localize them.

Does "submit codes and names to CLDR" mean "Tell @Nikerabbit"? Or is there an actual process documented somewhere?

So, let's go back to the main topic of this task, it seems that this work is already done on enwiki years ago (maybe except Category:User eml and Category:User no?), but basicly not on others. I'm afraid changing "manually or via bot" on others won't help anything (this panorama will be kept "happening-resolving-re-happening-re-resolving..." loop), therefore technical update to prevent such thing is really, and more and more needed. Since T11823 resolved, the Category:User be-tarask and Category:User be-x-old can be firstly combined. Nearly 8 years past (since rEBAB9767339d4c17b1992ddf2305596bc5cfa6e1e01c), and I believe, it's time to combin em.

So, let's go back to the main topic of this task

Which is? Please update the task description so that it's understandable. The description currently links two Wikidata items, which however seem not to be the focus of the bug report.

The report might be about the output of {{#babel}} when giving a code which is equivalent, but not identical, to a known language code. Correct? However, we still lack examples of categories populated by the Babel extension in a "wrong" way.

For instance, of the categories linked by the items in the task description, most are empty and only one was created by Babel. "{{#babel:lzh}} {{#babel:zh-classical}}" in test gives categories "User lzh User lzh-N User zh-classical User zh-classical-N".

Mnn, correct. For such "faux English" labels, the right way is to submit codes and names to CLDR, then let users on CLDR to localize them.

Part of "the right way" to handle unknown codes is "leave it blank if you don't know what language it is, instead of filling in 'English', which is almost always going to be wrong."

Perhaps we should "standardize" unknown and invalid codes on "produce an error message". The current "standard" seems to be "call them all English".

Nikki added a comment.Jul 10 2016, 8:10 PM

My understanding of the problem is that when someone uses {{#babel:zh-classical}}, the extension puts the user into "Category:User zh-classical" instead of into "Category:User lzh" even though they mean the same thing. Instead, it should understand that zh-classical is a legacy code and convert it to lzh, to avoid duplicate categories.

All the users in https://www.wikidata.org/wiki/Category:User_be-x-old, https://www.wikidata.org/wiki/Category:User_zh-classical and https://www.wikidata.org/wiki/Category:User_zh-yue are there because they used the Babel extension with an old code. Those categories duplicate https://www.wikidata.org/wiki/Category:User_be-tarask, https://www.wikidata.org/wiki/Category:User_lzh and https://www.wikidata.org/wiki/Category:User_yue. All six pages were created by the Babel AutoCreate account (before it got blocked).

Change 330041 had a related patch set uploaded (by TTO):
Map MediaWiki's fake language codes to real ones

https://gerrit.wikimedia.org/r/330041

Change 330041 merged by jenkins-bot:
[mediawiki/extensions/Babel@master] Map MediaWiki's fake language codes to real ones

https://gerrit.wikimedia.org/r/330041

Nikerabbit closed this task as Resolved.May 20 2017, 2:14 PM
Nikerabbit assigned this task to TTO.
Restricted Application added a subscriber: PokestarFan. · View Herald TranscriptJul 25 2017, 8:12 AM