Add language codes rm-rumgr, rm-sursilv, rm-surmiran, rm-sutsilv, rm-vallader, rm-puter for Lexemes
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	Sascha
	Nov 23 2018, 2:44 PM

Description

Please add the five language codes {rm-rumgr, rm-sursilv, rm-sutsilv, rm-vallader, rm-puter} to the list of language codes supported for Lexemes.

As per the IETF BCP47 language subtag registry [https://www.iana.org/assignments/language-subtag-registry/language-subtag-registry], rm-rumgr is the BCP47 language code for Rumantsch Grischun; rm-surmiran for Rumantsch Surmiran; rm-sutsilv for Rumantsch Sutsilvan; rm-sursilv for Rumantsch Sursilvan; rm-vallader for Rumantsch Vallader; rm-puter for Rumantsch Puter.

From a lexicographical perspective, these language variants are quite distinct: they have different vocabularies, different phonology, and inflection. (That’s also why they have different subtags in IETF BCP47). For reference, see http://www.pledarigrond.ch which has separate dictionaries for each variant.

In Wikidata, rm-rumgr is Q688873; rm-surmiran is Q690216; rm-sutsilv is Q688272; rm-sursilv is Q688348; rm-vallader is Q690226; rm-puter is Q688309.

Details

	Subject	Repo	Branch	Lines +/-
	Add Rumantsch dialects to available Lexeme languages	mediawiki/extensions/WikibaseLexeme	master	+23 -1

Customize query in gerrit

Related Objects

Mentioned In: T243250: Cannot add lexemes in Chukchi (ckt) and Dagbani (dag)
T222309: Language code "sms" not recognized in Commons
T222426: Add monolingual language codes rm-rumgr, rm-surmiran, rm-sursilv, rm-sutsilv, rm-vallader, rm-puter
T222423: Lexemes should display language name (not code) of Romansh variants in gloss language
rEWLE5326b08cef96: Add Rumantsch dialects to available Lexeme languages
rEWLE38fc6f44fc51: Add Rumantsch dialects to available Lexeme languages
rEWLEfa1ab9786e9b: Add Rumantsch dialects to available Lexeme languages
rEWLE7987e93a17d5: Add Rumantsch dialects to available Lexeme languages
Mentioned Here: T222426: Add monolingual language codes rm-rumgr, rm-surmiran, rm-sursilv, rm-sutsilv, rm-vallader, rm-puter
T222423: Lexemes should display language name (not code) of Romansh variants in gloss language

Event Timeline

Sascha created this task.Nov 23 2018, 2:44 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptNov 23 2018, 2:44 PM

For illustration, here’s the English word ‘dog’ in various variants of Romansh:

rm-rumgr: chaun
rm-sursilv: tgaun
rm-sutsilv: tgàn
rm-surmiran: tgang
rm-puter: chaun
rm-vallader: chan

For this one particular word, Rumantsch Grischun and Puter happen to use the same form; but generally they’re quite different.

@GerardM what's your consideration on this request?

@Sascha: Lexemes don't use the monolingual text list of languages, so adding these for monolingual text won't make them available for lexemes.

@Nikki, do you know how where/how I should request adding these BCP47 tags so they become available for Wikidata lexemes? I’ve a large dictionary in multiple Romansh variants, which I’d like to import to Wikidata lexemes, so this isn’t just an academic request.

Given that the language variants are quite distinct (see above for example), would it still make sense to support them for Wikidata properties? No strong opinions from my side; I wouldn’t completely understand the involved tradeoffs. (But curious — is there actually any downside to supporting more languages as long as they have BCP47 codes? If there’s no downside, why not support them all?)

I have no idea how to get languages added for lexemes, I've also been trying to find that out. :/

Whether it makes sense to add them for monolingual text too depends on whether there are statements where they would be useful (a good example, if you know of one, would be an item with multiple Romansh statements for the same property because of the different variants). Whether they would be added is another matter. I don't make the decisions and there are a number of requests that I think make sense which haven't been added.

Place names can vary by variant, for example St. Moritz [de] = San Murezzan [rm-rumgr] = Sogn Murezi [rm-sursilv] = San Murezi
[rm-sutsilv] = Son Murezzi [rm-surmiran]. But that’s a multilingual label, not a monolingual text statement.

Given that the codes should adhere to standards, what is the basis for these codes?

Given that the codes should adhere to standards, what is the basis for these codes?

Here’s the registry for IETF BCP47 language subtags:
https://www.iana.org/assignments/language-subtag-registry/language-subtag-registry
https://www.iana.org/assignments/lang-subtags-templates/lang-subtags-templates.xhtml

Here’s the entries in the registry:
https://www.iana.org/assignments/lang-subtags-templates/puter.txt
https://www.iana.org/assignments/lang-subtags-templates/rumgr.txt
https://www.iana.org/assignments/lang-subtags-templates/sursilv.txt
https://www.iana.org/assignments/lang-subtags-templates/sutsilv.txt
https://www.iana.org/assignments/lang-subtags-templates/surmiran.txt
https://www.iana.org/assignments/lang-subtags-templates/vallader.txt
(Note that they’ve all got a “Prefix: rm” in the registry, so the complete language tag is rm-puter etc.)

For a general introduction to IETF BCP47 language tags, see here:
https://www.w3.org/International/articles/language-tags/
https://en.wikipedia.org/wiki/IETF_language_tag

Friendly ping?

@GerardM, is there anything I can do to help with this ticket? There’s a sizable Romansh dictionary whose data can be donated to Wikidata, but this is currently blocked on this ticket. (Try an exact search for a few German words, eg. “Hund” or “Gelbsucht”, to see how the words are different in various variants of the Romansh language).

Friendly ping, is there anything I can do to help with this ticket?

In T210293#4781319, @Sascha wrote:

@Nikki, do you know how where/how I should request adding these BCP47 tags so they become available for Wikidata lexemes? I’ve a large dictionary in multiple Romansh variants, which I’d like to import to Wikidata lexemes, so this isn’t just an academic request.

Given that the language variants are quite distinct (see above for example), would it still make sense to support them for Wikidata properties? No strong opinions from my side; I wouldn’t completely understand the involved tradeoffs. (But curious — is there actually any downside to supporting more languages as long as they have BCP47 codes? If there’s no downside, why not support them all?)

@Lydia_Pintscher @Lea_Lacroix_WMDE Can you answer this question?

In T210293#4781319, @Sascha wrote:

Given that the language variants are quite distinct (see above for example), would it still make sense to support them for Wikidata properties? No strong opinions from my side; I wouldn’t completely understand the involved tradeoffs. (But curious — is there actually any downside to supporting more languages as long as they have BCP47 codes? If there’s no downside, why not support them all?)

I'd like some oversight to make sure we have a sensible collection of codes.

In this case let's move ahead. I'll adapt the task description to make it clear that it is about language codes for Lexemes.

Lydia_Pintscher renamed this task from Add monolingual language codes rm-rumgr, rm-sursilv, rm-surmiran, rm-sutsilv, rm-vallader, rm-puter to Add language codes rm-rumgr, rm-sursilv, rm-surmiran, rm-sutsilv, rm-vallader, rm-puter for Lexemes.Mar 17 2019, 12:01 PM

Lydia_Pintscher triaged this task as Medium priority.

Lydia_Pintscher updated the task description. (Show Details)

Lydia_Pintscher moved this task from Incoming to Ready to estimate on the Wikidata-Campsite board.

Curious, is it possible to estimate by what date this might get implemented? Is there anything I can do to help?

It's currently the second ticket to pick up for the next sprint. The camp (the team taking care of this) is understaffed this and next week so I am not sure when they'll be able to do it exactly but I hope soon.

• Greta_Doci_WMDE moved this task from Ready to estimate to Wikidata-Campsite-Iteration-∞ (On Hold) on the Wikidata-Campsite board.Apr 5 2019, 10:07 AM

• Greta_Doci_WMDE edited projects, added Wikidata-Campsite (Wikidata-Campsite-Iteration-∞ (On Hold)); removed Wikidata-Campsite.

Michael claimed this task.Apr 8 2019, 9:28 AM

Michael moved this task from To Do (prioritised from top to bottom) to Doing on the Wikidata-Campsite (Wikidata-Campsite-Iteration-∞ (On Hold)) board.

Michael added a project: Wikidata Lexicographical data.

Restricted Application added a project: User-Michael. · View Herald TranscriptApr 8 2019, 9:28 AM

Michael moved this task from 🗃️ Incoming to ⏳ In progress on the User-Michael board.Apr 8 2019, 9:28 AM

Change 502224 had a related patch set uploaded (by Michael Große; owner: Michael Große):
[mediawiki/extensions/WikibaseLexeme@master] Add Rumantsch dialects to available Lexeme languages

https://gerrit.wikimedia.org/r/502224

gerritbot added a project: Patch-For-Review.Apr 8 2019, 2:01 PM

Michael mentioned this in rEWLE7987e93a17d5: Add Rumantsch dialects to available Lexeme languages.Apr 8 2019, 2:02 PM

I don't know loads about this but I wanted to ask if this need could be met by our existing support with -x-. E.g. could you use rm-x-Q688873 for rm-rumgr?

WMDE-leszek subscribed.Apr 8 2019, 4:03 PM

In T210293#5094201, @Tarrow wrote:

I don't know loads about this but I wanted to ask if this need could be met by our existing support with -x-. E.g. could you use rm-x-Q688873 for rm-rumgr?

Could you elaborate on that functionality? It doesn't seem to work out of the box on my development setup:

Or is this something that has to be configured for allowed languages?

Michael mentioned this in rEWLEfa1ab9786e9b: Add Rumantsch dialects to available Lexeme languages.Apr 8 2019, 4:13 PM

Hoi,
When we go the way of allowing for languages that are not accepted in any
standard, we could use -x- However, it removes any possibility to question
the validity and the inclusion of anything used by such a code. So I
strongly urge us to refrain from non standard entries.
Thanks,

GerardM

In T210293#5094226, @Michael wrote:

Could you elaborate on that functionality? It doesn't seem to work out of the box on my development setup:

I suspect that it only works on lexeme pages, i.e. you would have to create the lexeme with "rm" first and then edit it.

In T210293#5094201, @Tarrow wrote:

I don't know loads about this but I wanted to ask if this need could be met by our existing support with -x-. E.g. could you use rm-x-Q688873 for rm-rumgr?

Probably, but I don't think it's a good idea to use private subtags (which have no meaning outside Wikimedia) when there are registered subtags available.

Just to clarify, the codes in this ticket (rm-rumgr etc.) are not made up; they have been standardized by IETF and appear in the IANA language subtag registry.

Yes,
I know and consequently they are fine.
Thanks,

GerardM

Yes,
I know these are fine.
Thanks,

Michael mentioned this in rEWLE38fc6f44fc51: Add Rumantsch dialects to available Lexeme languages.Apr 9 2019, 8:38 AM

Michael moved this task from Doing to Peer Review on the Wikidata-Campsite (Wikidata-Campsite-Iteration-∞ (On Hold)) board.Apr 9 2019, 8:45 AM

In T210293#5094226, @Michael wrote:

In T210293#5094201, @Tarrow wrote:

I don't know loads about this but I wanted to ask if this need could be met by our existing support with -x-. E.g. could you use rm-x-Q688873 for rm-rumgr?

Could you elaborate on that functionality? It doesn't seem to work out of the box on my development setup:

Or is this something that has to be configured for allowed languages?

rm-x-Q688873is allowed in the spelling variant of a form which is where I was blindly poking. It is also allowed in the spelling variant of the lemma (when editing on a lexeme page but apparently not on Special:NewLexeme). Interesting...

With the patch on gerrit rm-rumgr becomes usable in the same places (form + lemma language code) however it's still not allowed by the gloss language selector.

Maybe that is actually the desired functionality but I'm not clear on it :).

I'm now thinking that maybe rm-rumgr ought to behave like en-gb. E.g. be nicely usable in the gloss language selector as well as accepted in the lemma and form "spelling variants" boxes.

Michael moved this task from ⏳ In progress to 💬 waiting on other's work or feedback on the User-Michael board.Apr 9 2019, 4:55 PM

Michael mentioned this in rEWLE5326b08cef96: Add Rumantsch dialects to available Lexeme languages.Apr 16 2019, 3:44 PM

Change 502224 merged by jenkins-bot:
[mediawiki/extensions/WikibaseLexeme@master] Add Rumantsch dialects to available Lexeme languages

https://gerrit.wikimedia.org/r/502224

ReleaseTaggerBot added a project: MW-1.34-notes (1.34.0-wmf.3; 2019-04-30).Apr 17 2019, 6:00 PM

I just wanted to check we have a common understanding of what adding these languages "to the list of language codes supported for Lexemes." means:
You can see on beta that the merged patch means e.g. rm-surmiran can be used as a code for: lemmas, forms and glosses.

However, it's not added to the list of language hinted by e.g. the gloss language selector widget.
It's also not in order with the languages in the Special:NewLexeme language selector (i.e. just after rm)
It's also therefore given a languuage name in the server rendering of the entity page but not in the client resulting in a change like this as the JS kicks in:

Peek 2019-04-18 10-10.gif (176×429 px, 16 KB)

This might be fine and desired behaviour but we should be aware of the areas where these languages now differ from behaving like all our other languages.

An alternative solution to put rm-surmiran on a par with en-gb or de-ch etc.. might be to look at adding it on https://translatewiki.net/wiki/Translatewiki.net_languages (and optionally then disabling it for labels, descriptions and aliases)

In T210293#5121962, @Tarrow wrote:

I just wanted to check we have a common understanding of what adding these languages "to the list of language codes supported for Lexemes." means:
You can see on beta that the merged patch means e.g. rm-surmiran can be used as a code for: lemmas, forms and glosses.

However, it's not added to the list of language hinted by e.g. the gloss language selector widget.
It's also not in order with the languages in the Special:NewLexeme language selector (i.e. just after rm)
It's also therefore given a languuage name in the server rendering of the entity page but not in the client resulting in a change like this as the JS kicks in:

Very sensible remark, thanks.

This is the desired status based on the statement I've got from Product Management.
I am going to write the desired language code situation (which codes applicable to what parts of the system, etc) as some kind of non-really-architectural ADR. Sorry for being slow on this. I hope it will finally see the light next week after the holiday break.

This might be fine and desired behaviour but we should be aware of the areas where these languages now differ from behaving like all our other languages.

An alternative solution to put rm-surmiran on a par with en-gb or de-ch etc.. might be to look at adding it on https://translatewiki.net/wiki/Translatewiki.net_languages (and optionally then disabling it for labels, descriptions and aliases)

Michael moved this task from Peer Review to Test (Verification) on the Wikidata-Campsite (Wikidata-Campsite-Iteration-∞ (On Hold)) board.Apr 18 2019, 9:19 AM

This is the desired status based on the statement I've got from Product Management.

👍 Awesome!

Should be live near the end of the month if I read the tag right.

Is something else needed to activate lexemes in variants of Romansh? See screenshot:

Screenshot_20190502-120437.png (1×1 px, 131 KB)

@Sascha Did you try to enter the language code in the field "Spelling variant of the Lemma"?

Ah, got it. Thank you!

Hm, adding usage examples (and probably similar properties) doesn’t seem to work yet. Try adding the sentence “Ils tgauns vivan dalla naschientscha naven ensemen cullas nuorsas.” as usage example (P5831) in language “rm-sursilv” for tgaun (L45642); see screenshot.

Sascha reopened this task as Open.May 3 2019, 6:19 AM

It's acceptable for Lexemes, but not for monolingual statements. I can add them to the list of monolingual languages, but I would say that languages that are added as acceptable for Lexemes should also be available for monolingual languages. @Lydia_Pintscher: What's your preference? Should I make a patch to add them to monolingual languages?

Sascha mentioned this in T222423: Lexemes should display language name (not code) of Romansh variants in gloss language.May 3 2019, 7:10 AM

Filed T222423 for another (very minor) issue that seems related to language variants.

In T210293#5155020, @Mbch331 wrote:

It's acceptable for Lexemes, but not for monolingual statements. I can add them to the list of monolingual languages, but I would say that languages that are added as acceptable for Lexemes should also be available for monolingual languages. @Lydia_Pintscher: What's your preference? Should I make a patch to add them to monolingual languages?

Hi, when discussing the language code topic with @Lydia_Pintscher it was said that language codes added to the list of codes recognized for lemma variants and form representations are NOT supposed to be automatically added to the list of codes used for monolingual text, and labels, descriptions, aliases of items/properties.

Therefore

Should I make a patch to add them to monolingual languages?

Please don't. At least not without going through the regular process for monolingual text language codes first.

Is there anything specific I should do so that people can enter usage examples for Sursilvan lexemes, and likewise for lexemes in the various other Romansh variants? I’ll gladly file more tickets if it helps; just tell me what to do.

@Sascha To get languages added for monolingual text, you can follow this process https://www.wikidata.org/wiki/Help:Monolingual_text_languages#Getting_a_language_code_added

@Lea_Lacroix_WMDE Thank you! Filed T222426.

Hoi,
There is a big difference between mono-lingual texts and the use in labels
and descriptions. The latter needs support in translatewiki.net as well.
The criteria for that are much more stringent.
Thanks,

GerardM

@GerardM Yes, this request is merely about supporting Wikidata *content* in those language variants, eg. allowing people to enter Sursilvan usage examples for a Sursilvan lexeme (see T222426). No need to translate the *user interface* to Sursilvan, Vallader etc.

Closing again as the discussion continues in T222426

Nikki mentioned this in T222309: Language code "sms" not recognized in Commons.May 28 2019, 7:17 PM

Lucas_Werkmeister_WMDE mentioned this in T243250: Cannot add lexemes in Chukchi (ckt) and Dagbani (dag).Jan 22 2020, 3:31 PM

Maintenance_bot removed a project: Patch-For-Review.Jan 22 2020, 4:10 PM

Maintenance_bot moved this task from incoming to in progress on the Wikidata board.Jan 22 2020, 4:15 PM

jhsoby added a project: Language codes.Sep 14 2020, 10:13 AM

Bugreporter removed a parent task: T144272: [DO NOT USE] new monolingual language code requests for Wikidata (tracking) [superseded by #language_codes].Sep 15 2020, 2:36 PM

	F28898932: Screenshot_20190502-120437.png
	May 2 2019, 10:20 AM

	F28595529: image.png
	Apr 8 2019, 4:05 PM

Add language codes rm-rumgr, rm-sursilv, rm-surmiran, rm-sutsilv, rm-vallader, rm-puter for LexemesClosed, ResolvedPublicActions

Description

Details

Related Objects

Event Timeline

Add language codes rm-rumgr, rm-sursilv, rm-surmiran, rm-sutsilv, rm-vallader, rm-puter for Lexemes
Closed, ResolvedPublic
Actions