Page MenuHomePhabricator

Add monolingual language code nan-hani, cdo-hani, hak-hans, hak-hant
Open, LowPublic

Description

Please add the following language codes to the list of language codes supported for monolingual text values.

The language code: nan-Hani
Language name in the language itself or English: Min Nan (Hanji)
The used script, if not obvious: Hani
Where and when the language was or is used: Minnan-speaking area, modern era
The Wikidata item id: Q15901848

The language code: cdo-Hani
Language name in the language itself or English: Min Dong (Chinese characters)
The used script, if not obvious: Hani
Where and when the language was or is used: Min-Dong-speaking people. Modern era.
The Wikidata item id: Q5365165

The language code: hak-Hans
Language name in the language itself or English: Hakka (Chinese character, Simplified)
The used script, if not obvious: Hans
Where and when the language was or is used: mainland China, modern era
The Wikidata item id: Q22827960

The language code: hak-Hant
Language name in the language itself or English: Hakka (Chinese character, Traditional)
The used script, if not obvious: Hant
Where and when the language was or is used: Taiwan, Hong Kong, etc., modern era
The Wikidata item id: Q18165189

Usage example: Use for wikidata items like Q865

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Why requesting separated BCP47 codes for Min nan? Why don't we just split the TWN Min nan translations instead (so we can also make Min nan labels separated).

Why requesting separated BCP47 codes for Min nan? Why don't we just split the TWN Min nan translations instead (so we can also make Min nan labels separated).

TWN min nan? as in nan-TW? According to my understanding, there are also POJ users in Taiwan and Hani users in mainland China so that doesn't seem to solve the situation entirely, although my understanding is not necessarily complete.

@C933103:

TWN min nan? as in nan-TW? According to my understanding, there are also POJ users in Taiwan and Hani users in mainland China so that doesn't seem to solve the situation entirely, although my understanding is not necessarily complete.

You can see why I'm disencouraging "nan-Hant-TW" and "nan-Hans-CN" at T165882#3483545 (certainly, they the Min nan users should concern about their scripts used in interface first, not only requesting a lot of really unnecessary monolingual codes which are also locally defineable)

Almost all the Hani text being discussed and used related to the nan.wp project now are Hant. Disregarding Hans for now and use Hani instead of Hant would probably do the job in the current setting but what about when mainland China Hans users start visiting and editing the site?

Almost all the Hani text being discussed and used related to the nan.wp project now are Hant. Disregarding Hans for now and use Hani instead of Hant would probably do the job in the current setting but what about when mainland China Hans users start visiting and editing the site?

I don't think Hans should be considered in any kinds of Min, as some examples, the "只" and "隻"; "嬭" and "奶"; "发" and "發", … which are two-two having same means in Mandarin, are having totally different means in Min, thus the Hans create a number of ambiguous in Min.

C933103 renamed this task from Add monolingual language code nan-hant-tw, nan-hans-cn, cdo-hani, hak-hans, hak-hant, zh-nshu, ja-hant to Add monolingual language code nan-hani, cdo-hani, hak-hans, hak-hant, zh-nshu, ja-hant.Nov 27 2017, 5:41 PM
C933103 updated the task description. (Show Details)

hum edited task description accordingly

Yep, pure "hans" script is a Mandarin-optimized script, and it never works on Min languages, because of too many ambiguities. Actually, the best practice is to use a mixture of "regional characters" and "hant". Therefore, "hani" is usually approved.

Moreover, if you really want to use "hans" as the main script, some "hant" characters are still required to reduce ambiguities, so it is still "hani".

C933103 renamed this task from Add monolingual language code nan-hani, cdo-hani, hak-hans, hak-hant, zh-nshu, ja-hant to Add monolingual language code nan-hani, cdo-hani, hak-hans, hak-hant, ja-hant.Dec 10 2017, 8:49 PM
C933103 updated the task description. (Show Details)

Remove Nushu as use case related to the language and script can be covered by using monolingual code mis due to the lack of language code for Tuhua

C933103 renamed this task from Add monolingual language code nan-hani, cdo-hani, hak-hans, hak-hant, ja-hant to Add monolingual language code nan-hani, cdo-hani, hak-hans, hak-hant.Dec 11 2017, 3:03 AM
C933103 updated the task description. (Show Details)

Remove Japanese Kyujitai request as might be using variant subtag instead of script subtag could be a better idea? Although there are also problems in using variant subtag

@GerardM what's your consideration on this request?

Actually my original ticket could be a little clearer...
Like clarifying that the "example" there was meant to mean there are articles in cdo/nan/hak wikipedia that are written in alternative script and thus there should be related monolingual code that would allow recording of those article names in wikidata language field.
Thus I would like to bump the request for monolingual language code cdo-hani and nan-hani.
And then for hak... Can someone verify that "Hakka (Traditional Han script)" and "Hakka (Simplified Han Script)" are proper way to describe how Hakka speakers would write their language in Han scripts?

@jhsoby Do you want to do on this? I ask because I saw to you working on these task usually.

在T180771#4544573中,@C933103写道:

And then for hak... Can someone verify that "Hakka (Traditional Han script)" and "Hakka (Simplified Han Script)" are proper way to describe how Hakka speakers would write their language in Han scripts?

Of course it is one of the correct way to write this language. Ministry of Education, ROC awards Literary Award of Taiwanese and Hakka (教育部閩客語文學獎, their website is https://www.edu.tw) every year. You can see hak-hant here.

Ab6399 added a subscriber: Ab6399.

I will work on this issue

Change 555688 had a related patch set uploaded (by Ab6399; owner: Ab6399):
[mediawiki/extensions/Wikibase@master] Add several monolingual languages

https://gerrit.wikimedia.org/r/555688

Does langcom approve this? I couldn't find any clear approval so far.

I started a discussion in the Langcom about this.

在T180771#4544573中,@C933103写道:

And then for hak... Can someone verify that "Hakka (Traditional Han script)" and "Hakka (Simplified Han Script)" are proper way to describe how Hakka speakers would write their language in Han scripts?

Of course it is one of the correct way to write this language. Ministry of Education, ROC awards Literary Award of Taiwanese and Hakka (教育部閩客語文學獎, their website is https://www.edu.tw) every year. You can see hak-hant here.

Hello, my question was NOT about whether it can be written in Chinese script (which I know it can), instead my question was that, whether there are meaningful differences between "Hakka with Simplified characters" and "Hakka with Traditional characters", as some previously mentioned that in certain other Chinese languages, characters that are currently used by the Simplified script have other function in the written version of that language, making it almost impossible to write the language using Simplified script and thus there are no need to distinguish Simplified - Traditional Chinese for that language. What I would like to know is whether Hakka also fit this situation being described.

I'm not hearing any objections from the Language Committee, so I'm probably going to start adding these codes.

Let's start with nan-hani. What will be the autonym for it?

Lydia_Pintscher changed the task status from Open to Stalled.Sep 18 2020, 7:08 PM
Lydia_Pintscher added a subscriber: Lydia_Pintscher.

Let's start with nan-hani. What will be the autonym for it?

Can someone answer this?
Marking as stalled until we have an answer.

I'm not hearing any objections from the Language Committee, so I'm probably going to start adding these codes.

Let's start with nan-hani. What will be the autonym for it?

Based on the other Chinese autonyms in langdb.yaml, I would suggest "閩南語(漢字)". The first three characters are the language name in Chinese characters (as given on zh-min-nan.wikipedia.org) and the two characters inside the brackets are the word for Chinese characters (again, as used on zh-min-nan.wikipedia.org, see the last two characters of the autonym for cdo-hani too).

Mbch331 changed the task status from Stalled to Open.Dec 31 2020, 10:04 AM

Shouldn't they be lowercase for consistency?

Shouldn't they be lowercase for consistency?

Yes. If langcom agrees on all codes, I'll submit a patch will all lowercase, otherwise only with the approved languages.

Let's do nan-hani and see how it works.

Actually no, a moment.

If the autonym for cdo-hani doesn't have parentheses, should nan-hani have parentheses? I'd really love to hear from someone who knows Chinese well.

OK, I received several comments from speakers saying that parentheses are OK (example), so let's do nan-hani with 閩南語(漢字).

Change 669822 had a related patch set uploaded (by Amire80; owner: Amire80):
[mediawiki/extensions/UniversalLanguageSelector@master] Update jquery.uls from upstream

https://gerrit.wikimedia.org/r/669822

Change 669822 merged by jenkins-bot:
[mediawiki/extensions/UniversalLanguageSelector@master] Update jquery.uls from upstream

https://gerrit.wikimedia.org/r/669822

Change 669925 had a related patch set uploaded (by Mbch331; owner: Mbch331):
[mediawiki/extensions/Wikibase@master] Add monolingual language code nan-hani

https://gerrit.wikimedia.org/r/669925

Change 669930 had a related patch set uploaded (by Mbch331; owner: Mbch331):
[mediawiki/extensions/cldr@master] Add monolingual language code nan-hani

https://gerrit.wikimedia.org/r/669930

Change 669925 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] Add monolingual language code nan-hani

https://gerrit.wikimedia.org/r/669925

Change 669930 merged by jenkins-bot:
[mediawiki/extensions/cldr@master] Add monolingual language code nan-hani

https://gerrit.wikimedia.org/r/669930

This comment was removed by Yejianfei.

Great job! We have added the language code nan-hani.

Now it is time to add the language code cdo-hani.

CodeEnglish nameAutonymAutonym (alternatives)
cdo-latnMin Dong Chinese (Foochow Romanized)Mìng-dĕ̤ng-ngṳ̄ (Bàng-uâ-cê)Mìng-dĕ̤ng-ngṳ̄ Bàng-uâ-cê
cdo-haniMin Dong Chinese (Chinese characters)閩東語(漢字)閩東語漢字
noarave added a subscriber: noarave.

nan-hani is merged, stalling this on the campsite board until the additional language codes are approved by LangCom.

@Yejianfei There is no Langcom approval yet to add those languages.

Change 672648 had a related patch set uploaded (by Aklapper; owner: Yejianfei):
[mediawiki/extensions/Wikibase@master] Add monolingual code cdo-hani and cdo-latn

https://gerrit.wikimedia.org/r/672648

@Yejianfei There is no Langcom approval yet to add those languages.

To clarify, the keyword here is "yet". I'm not against against cdo-hani in principle. I just wanted to make sure that when nan-hani is deployed, it works as expected. Is nan-hani now deployed? Does it work as expected? Can anyone give some examples?

@Amire80 Yes, nan-hani "Min Nan (Hanji)" was deployed and seems to be working as expected, thank you. I did not see any real live examples of usage yet.

@Manuel: Any examples yet? @Amire80: Is cdo-hani ok as well, or do you still want to wait?

Retracted in reaction to @Nikki's comment T180771#7158281

@Mbch331: nan-hani monolingual language code has 0 uses in Wikidata to date.

Retracted in reaction to @Nikki's comment T180771#7158281

Statistics about monolingual language code use in Wikidata (15 June 2021)

Our SPARQL queries timed out (e.g. https://w.wiki/3Tpz). So @Ladsgroup ended up running a dump-based query instead. We used the opportunity to get a broader look at monolingual use in general if you are interested:

https://gist.github.com/Ladsgroup/ccc7d885f8f57f32b52e969920b4a3a3

Autonym for nan-hani:

@Yejianfei There is no Langcom approval yet to add those languages.

To clarify, the keyword here is "yet". I'm not against against cdo-hani in principle. I just wanted to make sure that when nan-hani is deployed, it works as expected. Is nan-hani now deployed? Does it work as expected? Can anyone give some examples?

I have just added the nan-hani label to a few wikidata, according to either the hani version of article title on nan wikipedia, or hani lang template for title on latin character articles on the wikipedia. Examples include Q703914, Q127031, Q45190, Q660947, Q36778, Q2914034. I think it is working as expected.

p.s. It seems like Nan wikipedia is trying to use either namespace or category to categorize articles written in Hani but none appears to be comprehensive, and due to problem in wikidata those articles are also undiscoverable from wikidata, making it hard to find them ...

p.p.s. Should someone post about this on nan wikipedia Village pump?

Statistics about monolingual language code use in Wikidata (15 June 2021)

Our SPARQL queries timed out (e.g. https://w.wiki/3Tpz). So @Ladsgroup ended up running a dump-based query instead. We used the opportunity to get a broader look at monolingual use in general if you are interested:

https://gist.github.com/Ladsgroup/ccc7d885f8f57f32b52e969920b4a3a3

That list doesn't seem to be accurate, I can't find nod in the list, but it's used twice on https://www.wikidata.org/wiki/Q565110 (added in 2016 and 2019, so not new either).

Thank you @Nikki for making me aware of this! I have now retracted my original comments.

Thanks for flagging this. I ran the code on that particular item and it recorded nod usages. It seems the dump in stat machines are somewhat broken. I will look into it.