Page MenuHomePhabricator

WMHack19: Add Saami + Romani languages to Wikidata
Open, Needs TriagePublic

Description

= this works. In the case of ULS, it means that it is in the ULS.
blank space = this doesn't work. In the case of ULS, it means that it is not in the ULS.
? = in principle, it should work, but doesn't for some reason.
<not known right now> = I don't know it. Doesn't mean it doesn't exist.

term = Wikidata labels, descriptions, or aliases;
mono = Monolingual fields in Wikidata
uls = Universal Language Selector
auto = Autocompletion suggests in Wikidata
sdc = Language choice in Structured Data on Commons

Saami languages:

lcodelangnameautonymtermmonouls autosdc note
smaSouthern Saamiåarjelsaemien gïele
sjuUmeubmejesámiengiälla
sjePitebidumsámegiella
smjLulejulevsámegiella
seNorthern Saamidavvisámegiella
sjkKemi<not known right now>
smnInarianarâškielâ
smsSkoltnuõʹrttsääʹmǩiõll, sääʹmǩiõll
siaAkkalasia-cyrl: а̄кь са̄мь кӣлл, а̄ххькэль са̄мь кӣлл, а̄кьяввьр са̄мь кӣлл. sia-ipa: ahʲkel kiːlː, ahʲkel sa:mʲ kiːlːsia-cyrl: Cyrillic, sia-ipa: IPA, sia-UPA: UPA
sjdKildinкӣллт са̄мь кӣлл, кӣлтса̄мь кӣллextended Cyrillic
sjtTersjt-cyrl: таррь са̄мь кӣлл. sjt-ipa: tarje kiːlː, tarje sa:mʲ kiːlːsjt-cyrl: Cyrillic, sjt-ipa: IPA, sjt-UPA: UPA

Romani languages:

lcodelangnameautonymtermmonouls autosdc note
rmnBalkan Romani
rmlBaltic Romani
rmcCarpathian Romani
rmfFinnish Kalokaalengo tšimb?(supported by ULS but doesn't work as monolingual text or label)
rmoSinte Romani
rmwWelsh-Romani
rmgTraveller Norwegian
rmyVlax Romaniromani čhib(randomly named as "Romani" in ULS; own name according to en-wiki is řomani čhib)

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Restricted Application added subscribers: jeblad, jhsoby. · View Herald TranscriptMay 17 2019, 8:49 AM
Yupik updated the task description. (Show Details)May 17 2019, 9:12 AM
Zache updated the task description. (Show Details)May 17 2019, 9:27 AM
Zache updated the task description. (Show Details)
Zache updated the task description. (Show Details)May 17 2019, 9:30 AM
Zache updated the task description. (Show Details)
Yupik added a subscriber: Trondtr.May 17 2019, 9:41 AM
Zache updated the task description. (Show Details)May 17 2019, 9:44 AM
Yupik updated the task description. (Show Details)May 17 2019, 9:56 AM
Yupik updated the task description. (Show Details)May 17 2019, 9:59 AM
Zache updated the task description. (Show Details)May 17 2019, 10:04 AM
Zache updated the task description. (Show Details)May 17 2019, 10:09 AM

Question number 1 is why southern sami does work in all places, but inari sami does not?

Zache renamed this task from Add Saami + Romani languages to Wikidata by the end of Wikimedia Hackathon 2019 to WMHACK2019: Add Saami + Romani languages to Wikidata.May 17 2019, 10:48 AM
Zache renamed this task from WMHACK2019: Add Saami + Romani languages to Wikidata to WMHack19: Add Saami + Romani languages to Wikidata.
Zache updated the task description. (Show Details)May 17 2019, 11:13 AM

Having the languages show up in Structured Data on Commons is an additional issue. Should that be added to the deficiency table?

i defined that would be out of scope for this hackathon because it is unlikely that we can do anything for it here in this timeframe.

Zache added a comment.EditedMay 17 2019, 12:13 PM

Adding language to labels

  • wmgExtraLanguageNames = labels, descriptions
  • Example: T220118

Adding language to Monolingual allowed values

  • add language code to WikibaseRepo.php
  • Example: T174229
  • broken autocomplete ticket T124758
Susannaanas added a comment.EditedMay 17 2019, 12:13 PM

I will try to advance getting them enabled in Structured Data on Commons. :-) Which languages will we focus on?

Yupik added a subscriber: siebrand.May 17 2019, 1:13 PM

Question number 1 is why southern sami does work in all places, but inari sami does not?

grep sje languages/data/Names.php

grep sma languages/data/Names.php 
                'sma' => 'Åarjelsaemien', # Southern Sami

sma has been added to Names.php because it has reached sufficient level of user interface translations.

Yupik added a comment.May 17 2019, 7:55 PM

Having the languages show up in Structured Data on Commons is an additional issue. Should that be added to the deficiency table?

Please add them all. I would hope we could get this issue over and done with for all these languages in one go.

Zache added a comment.EditedMay 17 2019, 9:08 PM

Notes related to structured data on commons labels/descriptions

"sma" which works on Structured data on Commons is defined in three places on commons.wikimedia.org:

  1. jquery.uls.data.js
  2. wbTermsLanguages-variable (in page source)
  3. window.wpAvailableLanguages -variable (in page source)

smj, smn, sms ... are already included to

wbTermsLanguages seems to come from Language::fetchLanguageNames()

Shows wgExtraLanguageNames languages with labels defined in wgExtraLanguageNames

Shows wgExtraLanguageNames languages with labels defined in somewhere else

Yupik updated the task description. (Show Details)May 17 2019, 9:33 PM
Nikerabbit added a comment.EditedMay 18 2019, 7:18 AM

Edit: Misunderstood whether X means works or not. Maybe use something less ambiguous. like

Yupik added a comment.May 18 2019, 8:11 AM

I'll change that now.

Yupik updated the task description. (Show Details)May 18 2019, 8:17 AM
Zache updated the task description. (Show Details)May 18 2019, 9:59 AM

Testing request for somebody. What Structured data on Commons does if wgExtraLanguageNames is set.

$wgExtraLanguageNames = [

'smn' => 'smn language',     // T220118
'sms' => 'sms language',      // T220118

];

*Question*: Is languages available in followin API queries query1 and quey also is language availabel in photos structured data fileinformation box if user tries to fill it?

Yupik updated the task description. (Show Details)May 18 2019, 3:21 PM

Submitted https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/cldr/+/511048/ to complete adding the language codes mentioned in this ticket to the CLDR extension.

Yupik updated the task description. (Show Details)May 18 2019, 3:37 PM
Yupik updated the task description. (Show Details)May 18 2019, 3:45 PM

Change 511054 had a related patch set uploaded (by Siebrand; owner: Siebrand):
[mediawiki/core@master] Correct autonym for rmy (Vlax Romani)

https://gerrit.wikimedia.org/r/511054

Yupik updated the task description. (Show Details)May 18 2019, 3:54 PM

Submitted https://github.com/wikimedia/language-data/pull/53 containing source data that will be used in ULS.

Change 511065 had a related patch set uploaded (by Siebrand; owner: Siebrand):
[mediawiki/extensions/Wikibase@master] Add Saami and Romani language codes

https://gerrit.wikimedia.org/r/511065

Just a reminder: I discussed the missing autonyms with @Nikerabbit, and we agreed that we shouldn't add incomplete language data. Please ensure there are verifiable/sourced autonyms for all language codes you would like added. At the moment, they are missing for rmc, rml, rmn, rmo.

Yupik added a comment.May 19 2019, 9:12 AM

Ok, thanks. I've got that on my list of things to do.

The autonym for rmy could also use a better reference, as I noted in https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/511054/ .

Other than that, thanks a lot for these efforts—I totally support the general idea.

The autonym for rmy could also use a better reference, as I noted in https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/511054/ .
Other than that, thanks a lot for these efforts—I totally support the general idea.

https://translate.google.com/translate?sl=auto&tl=en&u=https%3A%2F%2Fwww.databazeknih.cz%2Fknihy%2Fromani-chib-ucebnice-slovenske-romstiny-275698

The autonym for rmy could also use a better reference, as I noted in https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/511054/ .

The whole autonym requirement doesn't work for dead languages or for languages we only have an oral source for though. So we unfortunately end up with unusable data from languages that we will never be able to find an autonym for in the case of dead languages or a reliable source for both cases.

Change 511140 had a related patch set uploaded (by Zache-tool; owner: Zache-tool):
[mediawiki/extensions/Wikibase@master] T223524 fetching supported monolingual texts with wbContentLanguage API call. Just proof-of-concept for seeing if this should be fixed using http API-calls or internal function calls. Tested only with wikibase docker.

https://gerrit.wikimedia.org/r/511140

The autonym for rmy could also use a better reference, as I noted in https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/511054/ .

The whole autonym requirement doesn't work for dead languages or for languages we only have an oral source for though. So we unfortunately end up with unusable data from languages that we will never be able to find an autonym for in the case of dead languages or a reliable source for both cases.

It's possible to find a solution for dead languages. This is not a blocker. If any of these languages are dead, don't have an autonym that can be found, or somehow problematic in any other way, just tell me and we'll find a way.

For example, we can decide on some compromise name that will be the most useful for the people who work with this language in any way, as was done with Jewish Babylonian Aramaic: https://github.com/wikimedia/jquery.uls/pull/244

The autonym for rmy could also use a better reference, as I noted in https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/511054/ .

The whole autonym requirement doesn't work for dead languages or for languages we only have an oral source for though. So we unfortunately end up with unusable data from languages that we will never be able to find an autonym for in the case of dead languages or a reliable source for both cases.

It's possible to find a solution for dead languages. This is not a blocker. If any of these languages are dead, don't have an autonym that can be found, or somehow problematic in any other way, just tell me and we'll find a way.
For example, we can decide on some compromise name that will be the most useful for the people who work with this language in any way, as was done with Jewish Babylonian Aramaic: https://github.com/wikimedia/jquery.uls/pull/244

This is great, thanks! A similar workaround was used for the three Saami languages we don't have autonyms for right now (and in the case of Kemi Saami, probably ever.)

siebrand updated the task description. (Show Details)May 21 2019, 7:05 AM

Change 511054 merged by jenkins-bot:
[mediawiki/core@master] Correct autonym for rmy (Vlax Romani)

https://gerrit.wikimedia.org/r/511054

I've retracted the patch to the language data on GitHub. There is too much debate on autonyms, and there is not the right place. All proposed autonyms here should first be fully sourced and approved by @Nikerabbit or @Amire80 before I'll create a patch.

I've retracted the patch to the language data on GitHub. There is too much debate on autonyms, and there is not the right place. All proposed autonyms here should first be fully sourced and approved by @Nikerabbit or @Amire80 before I'll create a patch.

Which languages have now been removed?

I didn't remove any, but I had questions about the autonyms. See my comments at https://github.com/wikimedia/language-data/pull/53

Again, I don't really want to block it just because of the autonyms. I understand that these are very small, dead, or poorly documented languages and sources about them are difficult to find. However, if entering information about them into Wikidata or translating into them in translatewiki is realistic and useful, then it should also be possible to find translators or academic who can suggest an autonym. If it's definitely impossible to find a true autonym as a name of the language in the language itself, it should be some kind of a compromise title that is most useful for the linguists who will work with it.

Ok, thanks.

If I had realized the issues with the Romani languages last week already, it could have been as simple as going to the Department of Linguistics in Prague and asking for help, but they weren't around on the weekend.

I'll contact people later on this week about these languages.

Thank you. An email from linguists who are familiar with these languages will be perfect.

Yupik moved this task from Incoming to In progress on the WMFI board.May 21 2019, 11:24 AM
Yupik added a comment.EditedMay 21 2019, 2:44 PM

E-mail confirmation from the Helsinki University lecturer on Romani languages that kaalengo tšimb is indeed correct for rmf. Still working on the other languages:

kaalengo tšimb on oikein Suomen romanikielestä Suomen romanikielellä.

Susannaanas updated the task description. (Show Details)May 24 2019, 1:16 PM
Yupik updated the task description. (Show Details)May 24 2019, 1:40 PM
Yupik updated the task description. (Show Details)
Yupik added a comment.May 24 2019, 1:43 PM

I've added in the autonyms/endonyms for sia and sjt in IPA as provided by an expert linguist. The suggestion is to use IPA since they don't have any official orthography of their own and that the possibility of using UPA (Uralic Phonetic Alphabet) could also be good.

Yupik updated the task description. (Show Details)May 24 2019, 2:41 PM
Yupik added a comment.May 24 2019, 5:24 PM

And two other linguists recommended using Cyrillic:

Under akkalasamiska kan du skriva:
а̄кь са̄мь кӣлл, а̄ххькэль са̄мь кӣлл, а̄кьяввьр са̄мь кӣлл
Dessa beteckningar fick jag av den akkalasamiska språkutövaren som jag jobbade med i 2011.
Akkalasamiska har ingen ortografi, jag har skrivit orden med hjälp av den kildinsamiska ortografin.
För tersamiska har jag bara de kildinsamiska beteckningar som finns i Aleksandra Antonavas ordbok:
таррь са̄мь кӣлл, нуҏҏьт са̄мь кӣлл

So I think that for these two languages, it would be necessary to have all three possible ways of writing the languages (sia-cyrl, sia-ipa, sia-upa) and the same for sjt (sjt-cyrl, sjt-ipa, sjt-upa) since this is what we have data in.

Here's the same in UPA:

Yupik updated the task description. (Show Details)May 24 2019, 5:40 PM
Yupik moved this task from In progress to Patch for Review on the WMFI board.May 25 2019, 2:42 AM
Yupik updated the task description. (Show Details)Jun 27 2019, 10:25 AM