Page MenuHomePhabricator

Add monolingual language code ccp for Chakma
Closed, ResolvedPublic

Description

Please add the language code ccp to the list of language codes supported for monolingual text values.

Usage: For adding Chakma lexemes to Wikidata
Language item: https://www.wikidata.org/wiki/Q32952

Event Timeline

I would recommend adding separate codes for ccp-beng and ccp-cakm since both writing systems (Eastern Nagari and Ajhapath) have been used to write the language.

Hoi, according to the Wikipedia article there is one script; the Chakma script (Cakm). Eastern Nagari and Ajhapath are two orthographies using the same script? How are they related to the two peoples indicated to speak the language and in what orthography is the curriculum mentioned. IMHO that could be be the default and does not necessarily need an indication of the orthography.

Agree. The Chakma language is sometimes written in other scripts than the Chakma writing system, such as Bengali or Latin, but this seems to be rare. (In the future, other writing systems will probably get used more rarely than today, because support for the Chakma writing system is getting rolled out to modern computer operating systems only now). In the Unicode CLDR project, we’ve therefore made Cakm the default script for language ccp; see the line <likelySubtag from="ccp" to="ccp_Cakm_BD"/> in likelySubtags.xml. Also, in Unicode CLDR, all Chakma translations are currently kept in the Chakma writing system; we haven’t received any requests to support (in CLDR) the Chakma language ccp in other writing systems than Cakm. Just a data point; not sure if/how this matters for Wikimedia.

I suggested separate codes for separate scripts in this instance because of a situation in another language (Meitei) which is frequently used with two different scripts. While the CLDR indicates that the language exclusively uses the Eastern Nagari script, a contributor to Wikimedia projects (User:Awangba Mangang) has been providing localization exclusively in the Meetei Mayek script.

Ethnologue does state that the Chakma script is 'no longer in use' to write ccp, and in fact another contributor to Wikimedia projects may decide to exclusively use the Eastern Nagari script to write Chakma. (A similar situation will exist if someone later decides to add things in Sylheti--requiring syl-beng and syl-sylo codes--and Rohingya--requiring rhg-latn, rhg-arab, and rhg-rohg codes.)

In language-data we usually use the 2- or 3-letter code alone for the script that is more common and default, and a code with a tag for other scripts. For example, we have this for Manipuri, where the Meitei script is the default:

mni: [Mtei, [AS], ꯃꯤꯇꯩ ꯂꯣꯟ]
mni-beng: [Beng, [AS], মেইতেই লোন্]

and this for Javanese, where the Latin script is more common, but the Javanese native script is also used:

jv: [Latn, [AS, PA], Jawa]
jv-java: [Java, [AS, PA], ꦗꦮ]

If the Chakma script is indeed the common and default one for this language, then it should be ccp and ccp-beng.

It would be nice to improve the references in the Wikipedia article to demonstrate that the Chakma is indeed the common one, although it looks right to me.

I'll add it to language-data, too.

Change 548955 had a related patch set uploaded (by Zoranzoki21; owner: Zoranzoki21):
[mediawiki/extensions/Wikibase@master] Add monolingual language code ccp for Chakma

https://gerrit.wikimedia.org/r/548955

Yes, per Amir's comments adding it with the code ccp (withotu script identifier) is fine.

Would enabling the language code for monolingual enable it for lexemes though, since that's what's requested? I'm not sure how those relate to each other.

Change 548955 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] Add monolingual language code ccp for Chakma

https://gerrit.wikimedia.org/r/548955