Page MenuHomePhabricator

Add lexeme language codes sat-olck, sat-latn, sat-beng, sat-orya
Closed, ResolvedPublic

Description

This ticket is to add language codes for the representations of Santali lexemes and forms with ISO 15924 codes (lowercased) for Ol-chiki, Latin, Bengali and Oriya scripts, which are the 4 different scripts which are used for Santali language.

Event Timeline

According to https://en.wikipedia.org/wiki/Santali_language Ol-chiki is the official script, so sat-olck would be redundant (sat would always be sat-olck), but Iana doesn't tell to supress any script.
@Amire80 @jhsoby What's your opinion about adding these codes and should sat-olck be added or not?

While Ol-chiki is the script used in India, Santali community uses Bengali script in Bangladesh. And earlier books were written in Latin script by the British missionaries. https://en.wikipedia.org/wiki/Santali_Latin_alphabet

With the exception of sat-Olck (for the reasons @Mbch331 mentions) these all make sense to me, as long as there is use for them (and I trust @Bodhisattwa knows that best). So that's a "go" from me.

I'll add the codes except for sat-Olck. (@Bodhisattwa: Default script only means that omitting the script from the language code means the default script applies, so sat = sat-Olck.)

Change 633556 had a related patch set uploaded (by Mbch331; owner: Mbch331):
[mediawiki/extensions/WikibaseLexeme@master] Add lexeme language codes sat-latn, sat-beng, sat-orya (Santali in Latin, Bengali and Oriya scripts)

https://gerrit.wikimedia.org/r/633556

I must agree with Bodhi here that having a code for sat-olck is still necessary as it is not guaranteed that Santali speakers outside of India will be able to read it. "Official" in India need not mean "official" in the other countries in which it is spoken, as a closer read of the article on the language should indicate. Besides, we already have separate language codes for a particular language and the scripts in which it is written, including the "default" (such as kk and kk-arab, kk-cyrl, kk-latn, or iu and ike-cans, ike-latn, and similarly for ks, ku, tg, and ug) so I don't see a problem with continuing this trend in the interest of preventing ambiguity.

I must agree with Bodhi here that having a code for sat-olck is still necessary as it is not guaranteed that Santali speakers outside of India will be able to read it. "Official" in India need not mean "official" in the other countries in which it is spoken, as a closer read of the article on the language should indicate. Besides, we already have separate language codes for a particular language and the scripts in which it is written, including the "default" (such as kk and kk-arab, kk-cyrl, kk-latn, or iu and ike-cans, ike-latn, and similarly for ks, ku, tg, and ug) so I don't see a problem with continuing this trend in the interest of preventing ambiguity.

If I understand correctly, Ol Chiki is used as the default script for anything Santali (in Wikimedia projects). So if we add sat-olck we will essentially have two different language codes (sat and sat-olck) that cover the exact same thing. The rest make sense since they're different from the default, but as long as there is a default script for a language (in our context), it doesn't make sense to me to add the language code with the script specified.

Change 633556 merged by jenkins-bot:
[mediawiki/extensions/WikibaseLexeme@master] Add lexeme language codes sat-latn, sat-beng, sat-orya (Santali in Latin, Bengali and Oriya scripts)

https://gerrit.wikimedia.org/r/633556

Lydia_Pintscher subscribed.

Thanks everyone! This should go out next week :)