Page MenuHomePhabricator

Add support for all Saami languages to Wikidata
Open, Stalled, LowPublic

Description

Wikimedia Finland will be working together with the Skolt and Inari Saami communities. Having all of the Saami languages available would be useful for many purposes.

= this works. In the case of ULS, it means that it is in the ULS.
blank space = this doesn't work. In the case of ULS, it means that it is not in the ULS.
? = in principle, it should work, but doesn't for some reason.
<not known right now> = I don't know it. Doesn't mean it doesn't exist.

term = Wikidata labels, descriptions, or aliases;
mono = Monolingual fields in Wikidata
uls = Universal Language Selector
auto = Autocompletion suggests in Wikidata
sdc = Language choice in Structured Data on Commons
map = supported in https://maps.wikimedia.org maptiles (mapframe etc)

Saami languages:

lcodelangnameautonymtermmonouls autosdcmap note
smaSouthern Saamiåarjelsaemien gïeleunknown
sjuUmeubmejesámiengiälla
sjePitebidumsámegiella
smjLulejulevsámegiella
seNorthern Saamidavvisámegiella
sjkKemi<not known right now>extinct
smnInarianarâškielâ
smsSkoltnuõʹrttsääʹmǩiõll, sääʹmǩiõll
siaAkkalasia-cyrl: а̄кь са̄мь кӣлл, а̄ххькэль са̄мь кӣлл, а̄кьяввьр са̄мь кӣлл. sia-ipa: ahʲkel kiːlː, ahʲkel sa:mʲ kiːlːsia-cyrl: Cyrillic, sia-ipa: IPA, sia-UPA: UPA; extinct
sjdKildinкӣллт са̄мь кӣлл, кӣлтса̄мь кӣллextended Cyrillic
sjtTersjt-cyrl: таррь са̄мь кӣлл. sjt-ipa: tarje kiːlː, tarje sa:mʲ kiːlːsjt-cyrl: Cyrillic, sjt-ipa: IPA, sjt-UPA: UPA; nearly extinct

Event Timeline

In what way do you want support for Sámi languages? For Labels/Descriptions/Aliasses or just for monolingual properties?

sju, smn, sms, sjd are already available for monolingual property types.
If you want them for labels etc. they need to be made available to ULS, which isn't for the WIkidata team

We will then proceed to ask to include at least smn and sms in ULS. For the rest, for our purposes, the ability to add monolingual properties would be enough. Thank you!

Missing from https://github.com/wikimedia/language-data/blob/master/data/langdb.yaml are: sjk, sia, sjt. Do note that an autonym is required to add language names there. ULS uses this database. All other listed languages are already in ULS.

sju, smn, sms, sjd are already available for monolingual property types.
If you want them for labels etc. they need to be made available to ULS, which isn't for the WIkidata team

I cannot find their codes here: https://www.wikidata.org/wiki/Help:Wikimedia_language_codes/lists/all. Are these the ULS languages?

This does not work

Screen Shot 2019-03-14 at 19.36.53.png (195×954 px, 27 KB)
. This I guess should be the existing monolingual tag.

Neither do they display here

Screen Shot 2019-03-14 at 19.43.54.png (678×939 px, 138 KB)
although they are in my Babel.

The status of requested Sámi languages is marked below. Not all languages can be used although they are said to be ready.

The most up-to-date listing can be found in T223524

I cannot find their codes here: https://www.wikidata.org/wiki/Help:Wikimedia_language_codes/lists/all. Are these the ULS languages?

That page is generated from the P424 statements that users have added to Wikidata items. It doesn't say where the codes are used, so I think it's more confusing than helpful here.

Monolingual Where can this be checked?
Actual (can it be used in monolingual properties)
Actual (is it available in Labels

Unless something has changed recently:

Being in langdb.yaml is not enough to make a language usable in Wikidata.

The languages available for labels are:

The easiest way to check whether a language can be used for labels (in my opinion) is to look at the language field on https://www.wikidata.org/wiki/Special:NewItem

The languages available for monolingual text are:

Only the languages which are in Names.php show up in the suggestions for monolingual text. The ticket for fixing that is T124758.

@jhsoby-WMNO: Should there be another subtask asking for smn and sms to be made available for labels?

@jhsoby-WMNO: Should there be another subtask asking for smn and sms to be made available for labels?

Yes, that would be best. I can make one later unless you beat me to it. :-)

jhsoby-WMNO moved this task from In progress to Done on the WMNO-Sámi board.

I think this can be closed as resolved now.

The only thing that was mentioned that is missing is the autocomplete when typing language names in the monolingual text field. You can add monolingual texts by using the language codes, and it works, but if you try to type the name of the language it doesn't find it. That is an issue with all languages added for monolingual and not just these, so I don't feel like that issue "belongs" to this task specifically.

Hate to come back to this, but it is still not possible to use sju, sjd, sjt or sia with labels or descriptions in Wikidata. sju I don't need right now, but I do need to be able to input labels and descriptions in the other three. Towards the end of the summer, at the latest, I will also need sju.

Other languages that are missing are all but one of the Romani languages, five of which are considered national minority languages in Sweden. As is Kven (fkv).

I just tried with: https://www.wikidata.org/w/api.php?action=wbsetlabel&format=json&id=Q42&token=<valid token>&language=sju&value=Wikimedia
And I get this as a result:

{"error":{"code":"unknown_language","info":"Unrecognized value for parameter \"language\": sju.","*":"See https://www.wikidata.org/w/api.php for API usage. Subscribe to the mediawiki-api-announce mailing list at &lt;https://lists.wikimedia.org/mailman/listinfo/mediawiki-api-announce&gt; for notice of API deprecations and breaking changes."},"servedby":"mw1344"}
Yupik moved this task from Incoming to In progress on the WMFI board.
Yupik renamed this task from Add support for all Sámi languages to Add support for all Sámi languages to Wikidata.May 22 2019, 9:54 AM

Additionally, using sms as language failed while creating a lexeme

Screen Shot 2019-05-25 at 19.18.04.png (586×1 px, 72 KB)

Additionally, using sms as language failed while creating a lexeme

Screen Shot 2019-05-25 at 19.18.04.png (586×1 px, 72 KB)

Did you actually fill out the "Spelling variant of the lemma" field? I was able to create https://www.wikidata.org/wiki/Lexeme:L47045 fine.

Additionally, using sms as language failed while creating a lexeme

Screen Shot 2019-05-25 at 19.18.04.png (586×1 px, 72 KB)

Did you actually fill out the "Spelling variant of the lemma" field? I was able to create https://www.wikidata.org/wiki/Lexeme:L47045 fine.

Right, I did not. However, inside the lexeme, it cannot be used.

Do you want to create codes that allow entering UPA directly? e.g. "sms-fonupa" ?

Do you want to create codes that allow entering UPA directly? e.g. "sms-fonupa" ?

This is beyond my expertise, I will ask @Yupik to follow up, and propose to use a separate task.

You'd need that if most content you want to enter is directly in UPA

Do you want to create codes that allow entering UPA directly? e.g. "sms-fonupa" ?

We have something similar to that in T223524 for Akkala Saami and Ter Saami, as they don't have official orthographies of their own, so we have some stuff in Cyrillic, some in IPA, and some in UPA.

For Skolt Saami, I don't think that it's necessary to have separate codes, since it has an official orthography, although I would like to be able to add the UPA and IPA for items in that language when we have sources for it, similar to the way this has been done in Wikidata:Q102090 with IPA. I would also like some way of marking which dialect it is.

The IPA statement on Q102090 isn't really a sample to follow.

Yupik renamed this task from Add support for all Sámi languages to Wikidata to Add support for all Saami languages to Wikidata.Jun 1 2019, 10:37 PM
Yupik updated the task description. (Show Details)

Wikimedia Commons requires also local mediawiki:lang/langcode pages for https://commons.wikimedia.org/wiki/Module:Languages

Esc3300 updated the task description. (Show Details)

I updated the table above by linking the relevant Wikidata items and noting extinct languages (at least per Wikidata). Also in the meantime sjd and sju are available for terms.

To bring this to a conclusion as far as Wikidata is concerned, couldn't we also add sjk as term language? Maybe sia-cyrl and sjt-cyrl as well? Not sure what to suggest for the other sia/sjt codes.

I do not have enough expertise to evaluate the need for specific codes, but on a general level

  • All language codes could be included, so I definitely support adding sjk as well
  • Source materials may exist in Cyrillic form only, so I support adding them as well
  • Source materials may only exist in oral form without orthography, thus being able to record whatever is available would be important. I do not know the existing practices for recording phonetic transcripts, in my opinion they should be supported.
  • UPA is more commonly used for Uralic languages than IPA, and deciding which one to support would be beyond my knowledge

Considering that pretty much all the extant text we have for sjk is on the short enwiki article and even that is not using an official orthography since one doesn't exist, I don't see any need to add it to anything. I'm not really sure why I ever added it in the first place, so my apologies for that.

Sia-cyrl and sjt-cyrl would, in principle, be a good idea, but is likely to just create the potential for flyby vandalism that the incubator experiences from the stereotypical person-with-dictionary-but-no-actual-knowledge-of-said-language. Combined with sia's 0 speakers and sjt's 2 plus no official orthography, I have to say no to sia-cyrl and sjt-cyrl.

On a practical level, I think it boils down to the question if there are actual samples of labels that could be added in these languages to Wikidata items.

For one, there is a convenient list already at https://en.wikipedia.org/wiki/Ter_Sami#Example_of_words_in_Ter_saami[9]

I don't think incubator projects or the number of speakers are relevant. As we do have wikis on languages that are essentially spoken, the absence of an official orthography isn't an absolute barrier either.

If this is considered complete, please change its status to "resolved".

Esc3300 changed the task status from Open to Stalled.Jun 21 2021, 8:20 AM
Esc3300 lowered the priority of this task from High to Low.

changing status to "stalled" given that it's unclear if anything still needs to be done. Also, adjusting priority accordingly.

@Esc3300: This is not neverending by definition; removing tag.

Added map tiles to the checklist as maptiles are missing for sms and sun languages. Least smn should have maptiles as there is https://smn.wikipedia.org so there should be full language support.

Note: I checked that there should be label for Utsjoki in smn and sms languages in Open Street Map so the problem is not the missing label.

Examples