Page MenuHomePhabricator

Add bfw, gju, hoc and kgg to language names
Closed, ResolvedPublicFeature

Description

Steps to replicate the issue (include links if applicable):

  • Go to a video file on Commons (example) and click on TimedText option on top left.
  • Search for language codes bfw, gju, hoc or kgg from dropdown

What happens?:
The language codes don't appear

What should have happened instead?:
The language codes should appear just like en - English or fr - français. The languages for which the code should be added are:

'bfw' => 'Bonda/Remosam/ବଣ୍ଡା',
'gju' => 'गुज्जरी/Gujari/Gojri',
'hoc' => 'Ho/𑢹𑣉𑣉 𑣎𑣋𑣜',
'kgg' => 'Kusunda/Gemehaq gipan'

Software version (on Special:Version page; skip for WMF-hosted wikis like Wikipedia):

Other information (browser name/version, screenshots, etc.):

Details

Related Changes in Gerrit:
Related Changes in GitLab:
TitleReferenceAuthorSource BranchDest Branch
Update function-schemata sub-module to HEAD (7a5b59a)repos/abstract-wiki/wikifunctions/wikilambda-cli!59jforrestersync-function-schematamain
Update function-schemata sub-module to HEAD (7a5b59a)repos/abstract-wiki/wikifunctions/function-evaluator!295jforrestersync-function-schematamain
Update function-schemata sub-module to HEAD (7a5b59a)repos/abstract-wiki/wikifunctions/function-orchestrator!265dmartinsync-function-schematamain
definitions: Add Z1958/bfw, Z1959/gju-arab, Z1960/gju-deva, Z1961/hoc, Z1962/kgg, and Z1963/ljp ZNaturalLanguagesrepos/abstract-wiki/wikifunctions/function-schemata!183jforresterlanguagesmain
Customize query in GitLab

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
jhsoby subscribed.

This is not a request for a new language for translation though. @psubhashish1 wants to add (and presumably show) subtitles in these languages which aren't supported. He shouldn't be required to translate (or find someone to translate) the MediaWiki interface in order to do that.

FWIW, Subhashish, you can already save the TimedText in these languages – just follow the procedure to add TimedText as you normally would (but choosing another language), and when you get to the TimedText page, you replace the language code in the URL with one of these language codes. There is, however, no way (that I've found) to actually display those subtitles yet, so the usability for anything other than archiving purposes is limited.

I made a couple of tests just now:

The difference between the two is that in InitialiseSettings.php, rmf has been added to wmgExtraLanguageNames for Commons, while there's no such config for Beta Commons. So the way to solve this ticket would be adding the four requested lanugages to wmgExtraLanguageNames for Commons (and Wikidata, since it's better to keep the two lists in sync).

@psubhashish1 We need the autonyms to add them to that config. You submitted two or three names per language in the task description – can you clarify which ones are the autonyms that should be used?

jhsoby renamed this task from Add languages with ISO codes to Add bfw, gju, hoc and kgg to language names.Dec 11 2024, 10:12 AM

Thanks for the clear guidance, @jhsoby.

I indeed had created the Gujari subs in a similar way you mentioned (creating subs from any language and editing the language code from URL) -- see this. However, it doesn't appear in the video CC option. That's the biggest problem.

Can we have hyphenated pseudonyms for names? In the list below, the first one is the common name, the second is the endonym, and the third is the name in the native script. In the absence of a third name, the second one is the name in native language/script.

'bfw' => 'Bonda - Remosam - ବଣ୍ଡା',
'gju' => 'Gujari - گوجری - गुज्जरी',
'hoc' => 'Ho - 𑢹𑣉𑣉 𑣎𑣋𑣜',
'kgg' => 'Kusunda - Gemehaq gipan'

Note that Gujari is most often written in the Arabic script (گوجری).

@mrephabricator, Gujari is actually written in multiple scripts. Most Gujari speakers in India are in states where Hindi is a dominant and official language, and in Nepal where Nepali (written in Devanagari) is the national language. Gujari is predominately an oral language and is not taught widely in schools or is used in media or publications. So, those residing in India and Nepal learn Devanagari due to school education and use it for Gujari when they need to. Van Gujjari, a variant of Gujjari, uses Devanagari for this reason even though all Van Gujjars are Muslims who would have preferred Perso-Arabic over Devanagari. Over 174 books are translated and several books are published in Van Gujjari and these are in Devanagari. Gujjars living in Pakistan and Afghanistan would use Perso-Arabic. For that reason, keeping the name in both Perso-Arabic and Devanagari might be helpful (I've edited my earlier comment). Thanks for suggesting this.

Can we have hyphenated pseudonyms for names?

No, not really, that would be breaking with the standard we've decided on for these lists. The names already in the list follow the same format as MediaWiki core's Names.php. So we can't really have names that are in a different language than the language itself.

In the list below, the first one is the common name, the second is the endonym, and the third is the name in the native script. In the absence of a third name, the second one is the name in native language/script.

In case the language is written in more than one script, we can add codes for each script. So since Gujari is written (when it is written at all) both with Arabic and Devanagari scripts, we can add 'gju-arab' => 'گوجری', and 'gju-deva' => 'गुज्जरी',.

Ok, we can have just the following for the time being:

'bfw' => 'Bonda',
'gju' => 'Gujari',
'hoc' => 'Ho',
'kgg' => 'Kusunda'
'gju-arab' => 'گوجری',
'gju-deva' => 'गुज्जरी',

Hi @jhsoby and other friends: I wanted to follow up and check about this again. Will my last suggested options work? If so, what would be the next step?

Hi, sorry about the late reply! Happy new year!

These should be fine, I think. But are you sure we'd need the general code for gju if we have the two script-specific ones? (Another way to ask: Where would we need to use gju instead of either gju-arab og gju-deva?)

And I'm a little bit surprised to see the names of bfw, hoc and kgg in the Latin script, knowing that the Latin script is quite uncommon (but not unheard of) in this area. That's not necessarily a blocker, but I just want to double-check.

Happy New Year @jhsoby. Thanks for the flag. jgu should indeed not be needed, making space for gju-arab and gju-deva. we should have 'hoc' => '𑢹𑣉𑣉 𑣎𑣋𑣜', instead of 'hoc' => 'Ho' which is in Warang Citi, the script used widely by native speakers. bfw and kgg are predominantly spoken languages. The majority of the literate Bonda ('bfw') speakers would know Odia, the official script for the Odia language of Odisha, where the speakers live, and some books have used Odia in publications, but there is no standard script. Similarly, kgg also has no native script, whereas most literate speakers are in Nepal and are fluent in the Devanagari script. bfw is ରେମସାମ୍ ("Remosam", an endonym) in Odia script and kgg is गेम्येहाक़ गिपन("Gejmehac Gipan", the endonym).

Change #1108403 had a related patch set uploaded (by Jon Harald Søby; author: Jon Harald Søby):

[operations/mediawiki-config@master] Add bfw, gju-arab, gju-deva, hoc and kgg to wmgExtraLanguageNames

https://gerrit.wikimedia.org/r/1108403

Thanks @psubhashish1! Added a patch for it now, will see when I can add it to a deployment window.

Change #1108403 merged by jenkins-bot:

[operations/mediawiki-config@master] Add bfw, gju-arab, gju-deva, hoc and kgg to wmgExtraLanguageNames

https://gerrit.wikimedia.org/r/1108403

Mentioned in SAL (#wikimedia-operations) [2025-01-06T14:39:19Z] <lucaswerkmeister-wmde@deploy2002> Started scap sync-world: Backport for [[gerrit:1108403|Add bfw, gju-arab, gju-deva, hoc and kgg to wmgExtraLanguageNames (T381934)]]

Mentioned in SAL (#wikimedia-operations) [2025-01-06T14:45:59Z] <lucaswerkmeister-wmde@deploy2002> lucaswerkmeister-wmde, jhsoby: Backport for [[gerrit:1108403|Add bfw, gju-arab, gju-deva, hoc and kgg to wmgExtraLanguageNames (T381934)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)

Mentioned in SAL (#wikimedia-operations) [2025-01-06T14:56:31Z] <lucaswerkmeister-wmde@deploy2002> Finished scap sync-world: Backport for [[gerrit:1108403|Add bfw, gju-arab, gju-deva, hoc and kgg to wmgExtraLanguageNames (T381934)]] (duration: 17m 11s)

jforrester opened https://gitlab.wikimedia.org/repos/abstract-wiki/wikifunctions/function-schemata/-/merge_requests/183

definitions: Add Z1958/bfw, Z1959/gju-arab, Z1960/gju-deva, Z1961/hoc, Z1962/kgg, and Z1963/ljp ZNaturalLanguages

dmartin merged https://gitlab.wikimedia.org/repos/abstract-wiki/wikifunctions/function-schemata/-/merge_requests/183

definitions: Add Z1958/bfw, Z1959/gju-arab, Z1960/gju-deva, Z1961/hoc, Z1962/kgg, and Z1963/ljp ZNaturalLanguages

Change #1108482 had a related patch set uploaded (by Jforrester; author: Jforrester):

[mediawiki/extensions/WikiLambda@master] Update function-schemata sub-module to HEAD (7a5b59a)

https://gerrit.wikimedia.org/r/1108482

Hi @jhsoby et al., while these languages appear in videos after SRT files are uploaded, I still don't see the language name in the dropdown while uploading.

Change #1108482 merged by jenkins-bot:

[mediawiki/extensions/WikiLambda@master] Update function-schemata sub-module to HEAD (7a5b59a)

https://gerrit.wikimedia.org/r/1108482

Change #1109064 had a related patch set uploaded (by Jon Harald Søby; author: Jon Harald Søby):

[mediawiki/extensions/TimedMediaHandler@master] Allow more languages in the TimedText language dropdown

https://gerrit.wikimedia.org/r/1109064

Change #1109087 had a related patch set uploaded (by Jforrester; author: Jforrester):

[operations/deployment-charts@master] wikifunctions: Upgrade orchestrator from 2024-12-17-184905 to 2025-01-08-142250

https://gerrit.wikimedia.org/r/1109087

Change #1109087 merged by jenkins-bot:

[operations/deployment-charts@master] wikifunctions: Upgrade orchestrator from 2024-12-17-184905 to 2025-01-08-142250

https://gerrit.wikimedia.org/r/1109087

Change #1109064 merged by jenkins-bot:

[mediawiki/extensions/TimedMediaHandler@master] Allow more languages in the TimedText language dropdown

https://gerrit.wikimedia.org/r/1109064

Another language I want to request adding is Desia ('dso' => 'ଦେଶିଆ'). It's a part of the Odia macrolanguage. Predominantly a spoken lect, it can be written in the Odia script.

Another language I want to request adding is Desia ('dso' => 'ଦେଶିଆ'). It's a part of the Odia macrolanguage. Predominantly a spoken lect, it can be written in the Odia script.

Please create a new task

Change #1108403 merged by jenkins-bot:

[operations/mediawiki-config@master] Add bfw, gju-arab, gju-deva, hoc and kgg to wmgExtraLanguageNames

https://gerrit.wikimedia.org/r/1108403

gju-arab needs a MessagesGju_arab.php file in core (see MessagesOta.php for example) so that MediaWiki treats it as rtl.

bfw and kgg need adding to LocalNamesEn.php in the CLDR extension.

bfw, gju-arab, gju-deva and kgg should be added to the language-data repository.

For consistency, bfw, gju-arab, gju-deva and kgg should also be added to WikibaseContentLanguages.php in the Wikibase extension.

Change #1112563 had a related patch set uploaded (by Anzx; author: Anzx):

[mediawiki/extensions/Wikibase@master] Add monolingual language codes dso, thq, bfw, gju-arab, gju-deva, hoc and kgg

https://gerrit.wikimedia.org/r/1112563

Change #1112565 had a related patch set uploaded (by Anzx; author: Anzx):

[mediawiki/extensions/cldr@master] Add dso, thq, bfw, kgg to LocalNamesEn.php

https://gerrit.wikimedia.org/r/1112565

Adding to our board for review of the attached Wikibase change (but if more work than just review is required from us then it’ll have to be prioritized).

Change #1108403 merged by jenkins-bot:

[operations/mediawiki-config@master] Add bfw, gju-arab, gju-deva, hoc and kgg to wmgExtraLanguageNames

https://gerrit.wikimedia.org/r/1108403

gju-arab needs a MessagesGju_arab.php file in core (see MessagesOta.php for example) so that MediaWiki treats it as rtl.

bfw and kgg need adding to LocalNamesEn.php in the CLDR extension.

bfw, gju-arab, gju-deva and kgg should be added to the language-data repository.

For consistency, bfw, gju-arab, gju-deva and kgg should also be added to WikibaseContentLanguages.php in the Wikibase extension.

Agree. Filed a PR on language data repo.

Change #1112565 merged by jenkins-bot:

[mediawiki/extensions/cldr@master] Add dso, thq, bfw, kgg to LocalNamesEn.php

https://gerrit.wikimedia.org/r/1112565

Change #1113572 had a related patch set uploaded (by Jforrester; author: Jforrester):

[operations/deployment-charts@master] wikifunctions: Upgrade evaluators from 2025-01-08-143723 to 2025-01-22-212306

https://gerrit.wikimedia.org/r/1113572

Change #1113572 merged by jenkins-bot:

[operations/deployment-charts@master] wikifunctions: Upgrade evaluators from 2025-01-08-143723 to 2025-01-22-212306

https://gerrit.wikimedia.org/r/1113572

What are the sources for the names गेम्येहाक़ गिपन, ରେମସାମ୍, गुज्जरी? Are they written anywhere outside the Wikimedia world?

@Amire80 People's Archive of Rural India uses the spelling "गुज्जरी", an endonym. Aaley et al. (2019. Kusunda 250 Word List Audio Files. http://doi.org/10.5281/zenodo.3377537) as well as Watters (2006: 139-152) both mention the endonym of the Kusunda language as "gejmehac/gejmehaq gipan". I reached out to Aaley to ask for the correct Devanagari spelling and "गेम्येहाक़ गिपन". He is also the lexicographer of the Kusunda-Nepali-English dictionary "Kusunda Jati ra Shabdakosh," which mentions this spelling. The first word, "Gejmehac," and the second word, "gipan," can be heard respectively in this and this respectively, taken from a 2018 interview with Gyani Maiya Sen-Kusunda, one of the two fluent speakers alive in 2020 (she died that year and her younger sister Kamala is the only fluent speaker of the language). "ରେମସାମ୍" is mentioned in the 1998 eponymous book by Gobardhan Panda. I hope these suffice.

Change #1112563 merged by jenkins-bot:

[mediawiki/extensions/Wikibase@master] Add term support for languages codes dso, thq, bfw, gju-arab, gju-deva, hoc and kgg

https://gerrit.wikimedia.org/r/1112563

Change #1115032 had a related patch set uploaded (by Cory Massaro; author: Cory Massaro):

[operations/deployment-charts@master] wikifunctions: Upgrade orchestrator from version: 2025-01-22-203140 to 2025-01-28-144249

https://gerrit.wikimedia.org/r/1115032

Change #1115032 abandoned by Cory Massaro:

[operations/deployment-charts@master] wikifunctions: Upgrade orchestrator from version: 2025-01-22-203140 to 2025-01-28-144249

Reason:

already done

https://gerrit.wikimedia.org/r/1115032

jhsoby changed the subtype of this task from "Bug Report" to "Feature Request".Feb 21 2025, 8:44 AM