Page MenuHomePhabricator

Add lexeme language codes kyw-beng, kyw-deva, txo-beng, cdz-beng, mjx-beng, ksy-beng
Open, Needs TriagePublic

Description

The West Bengal WIkimedians User Group is collaborating with School of Language and Linguistics of Jadavpur University, Kolkata to add lexicographical data about 5 endangered languages spoken in West Bengal through assigning 5 Wikimedians-in-residence.

The languages in ISO 639-3 and script in ISO 15924 subtag (lowercased) used will be as follows

  • kyw-beng - Kurmali with Bangla script
  • kyw-deva - Kurmali with Devanagari script
  • txo-beng - Toto with Bangla script
  • cdz-beng - Koda with Bangla script
  • mjx-beng - Mahali with Bangla script
  • ksy-beng - Kharia Thar with Bangla script

The above mentioned linguistics department has the lexicographical data of all these languages in these scripts and has agreed to upload them on Wikidata.

Event Timeline

Bodhisattwa updated the task description. (Show Details)
Bodhisattwa updated the task description. (Show Details)
Bodhisattwa renamed this task from Add lexeme language codes kyw-beng, kyw-deva, txo-beng, cdz-beng, mjx-beng, lbm-beng to Add lexeme language codes kyw-beng, kyw-deva, txo-beng, cdz-beng, mjx-beng, ksy-beng.Aug 4 2024, 2:19 PM

Change #1059860 had a related patch set uploaded (by Bodhisattwa; author: Bodhisattwa):

[mediawiki/extensions/WikibaseLexeme@master] Add lexeme language codes kyw-beng, kyw-deva, txo-beng, cdz-beng, mjx-beng, ksy-beng

https://gerrit.wikimedia.org/r/1059860

This comment was removed by Bodhisattwa.

The reason I am requesting the scripts for these languages is because they do not have one single standard script. Please find the list of scripts which each languages use,

  1. Kurmali - Devanagari, Bangla, Odia, Chisoi (newly developed)
  2. Toto - Bangla, Toto (newly developed)
  3. Koda - Nagachiki, Bangla
  4. Mahali - no script, usually uses Bangla script
  5. Kharia Thar - no script, usually uses Bangla script

@Amire80 for your consideration

Looks mostly okay, but is there any chance you could decide that certain scripts are primary for any of these languages and use just the language code without script code? It happened several times that we added script codes that turned out to be not useful, and then it's hard to remove them, like skr-arab.

Looks mostly okay, but is there any chance you could decide that certain scripts are primary for any of these languages and use just the language code without script code? It happened several times that we added script codes that turned out to be not useful, and then it's hard to remove them, like skr-arab.

  1. Kurmali speakers use the script of the most used languages of the regions they live, for e.g Devanagari script in Jharkhand, Odia script in Odisha, Bangla script in West Bengal. Chisoi is a newly developed artificial script which is not widely used in the community.
  2. Toto language speakers use Bangla script as they live in West Bengal. A Toto script has been developed very recently, but not widely used.
  3. Same goes with Koda speakers. Koda people living in West Bengal use Bangla script. They are also developing an artificial script named Nagachiki, which is not widely used in their community.
  4. Mahali do not have their own primary script. People speaking this language uses Bangla and sometime Latin
  5. Same goes for Kharia Thar, who also do not have any primary script but uses Bangla as the script available to their native region.

So, unfortunately, there are no primary scripts for any of these languages. Bangla is also not their primary script because this is the script which is available to their native region and they use this script for their convenience. If we had an option to add language codes without script codes, we would definitely do so, but for these languages, we have no other option.

OK, I guess no objection from Language committee. The patches for this are usually done by Wikidata developers.

Change #1059860 merged by jenkins-bot:

[mediawiki/extensions/WikibaseLexeme@master] Add lexeme language codes kyw-beng, kyw-deva, txo-beng, cdz-beng, mjx-beng, ksy-beng

https://gerrit.wikimedia.org/r/1059860

Nikki subscribed.

This was not done correctly. Lexeme language codes need to be added to LocalNamesEn.php in the CLDR extension, so that MediaWiki has a language name for them (the ones in the WikibaseLexeme extension are not used anywhere except Special:NewLexeme). That will automatically make them available for lexemes.

They should not be added to the WikibaseLexeme extension unless they can't be added to the CLDR one, and in such cases, they should also be added in the Wikibase extension as well, for monolingual text.