Page MenuHomePhabricator

Add language codes rm-rumgr, rm-sursilv, rm-surmiran, rm-sutsilv, rm-vallader, rm-puter for Lexemes
Closed, ResolvedPublic

Description

Please add the five language codes {rm-rumgr, rm-sursilv, rm-sutsilv, rm-vallader, rm-puter} to the list of language codes supported for Lexemes.

As per the IETF BCP47 language subtag registry [https://www.iana.org/assignments/language-subtag-registry/language-subtag-registry], rm-rumgr is the BCP47 language code for Rumantsch Grischun; rm-surmiran for Rumantsch Surmiran; rm-sutsilv for Rumantsch Sutsilvan; rm-sursilv for Rumantsch Sursilvan; rm-vallader for Rumantsch Vallader; rm-puter for Rumantsch Puter.

From a lexicographical perspective, these language variants are quite distinct: they have different vocabularies, different phonology, and inflection. (That’s also why they have different subtags in IETF BCP47). For reference, see http://www.pledarigrond.ch which has separate dictionaries for each variant.

In Wikidata, rm-rumgr is Q688873; rm-surmiran is Q690216; rm-sutsilv is Q688272; rm-sursilv is Q688348; rm-vallader is Q690226; rm-puter is Q688309.

Event Timeline

For illustration, here’s the English word ‘dog’ in various variants of Romansh:

  • rm-rumgr: chaun
  • rm-sursilv: tgaun
  • rm-sutsilv: tgàn
  • rm-surmiran: tgang
  • rm-puter: chaun
  • rm-vallader: chan

For this one particular word, Rumantsch Grischun and Puter happen to use the same form; but generally they’re quite different.

@Sascha: Lexemes don't use the monolingual text list of languages, so adding these for monolingual text won't make them available for lexemes.

@Nikki, do you know how where/how I should request adding these BCP47 tags so they become available for Wikidata lexemes? I’ve a large dictionary in multiple Romansh variants, which I’d like to import to Wikidata lexemes, so this isn’t just an academic request.

Given that the language variants are quite distinct (see above for example), would it still make sense to support them for Wikidata properties? No strong opinions from my side; I wouldn’t completely understand the involved tradeoffs. (But curious — is there actually any downside to supporting more languages as long as they have BCP47 codes? If there’s no downside, why not support them all?)

I have no idea how to get languages added for lexemes, I've also been trying to find that out. :/

Whether it makes sense to add them for monolingual text too depends on whether there are statements where they would be useful (a good example, if you know of one, would be an item with multiple Romansh statements for the same property because of the different variants). Whether they would be added is another matter. I don't make the decisions and there are a number of requests that I think make sense which haven't been added.

Place names can vary by variant, for example St. Moritz [de] = San Murezzan [rm-rumgr] = Sogn Murezi [rm-sursilv] = San Murezi
[rm-sutsilv] = Son Murezzi [rm-surmiran]. But that’s a multilingual label, not a monolingual text statement.

Given that the codes should adhere to standards, what is the basis for these codes?

@GerardM, is there anything I can do to help with this ticket? There’s a sizable Romansh dictionary whose data can be donated to Wikidata, but this is currently blocked on this ticket. (Try an exact search for a few German words, eg. “Hund” or “Gelbsucht”, to see how the words are different in various variants of the Romansh language).

Friendly ping, is there anything I can do to help with this ticket?

@Nikki, do you know how where/how I should request adding these BCP47 tags so they become available for Wikidata lexemes? I’ve a large dictionary in multiple Romansh variants, which I’d like to import to Wikidata lexemes, so this isn’t just an academic request.

Given that the language variants are quite distinct (see above for example), would it still make sense to support them for Wikidata properties? No strong opinions from my side; I wouldn’t completely understand the involved tradeoffs. (But curious — is there actually any downside to supporting more languages as long as they have BCP47 codes? If there’s no downside, why not support them all?)

@Lydia_Pintscher @Lea_Lacroix_WMDE Can you answer this question?

Given that the language variants are quite distinct (see above for example), would it still make sense to support them for Wikidata properties? No strong opinions from my side; I wouldn’t completely understand the involved tradeoffs. (But curious — is there actually any downside to supporting more languages as long as they have BCP47 codes? If there’s no downside, why not support them all?)

I'd like some oversight to make sure we have a sensible collection of codes.

In this case let's move ahead. I'll adapt the task description to make it clear that it is about language codes for Lexemes.

Lydia_Pintscher renamed this task from Add monolingual language codes rm-rumgr, rm-sursilv, rm-surmiran, rm-sutsilv, rm-vallader, rm-puter to Add language codes rm-rumgr, rm-sursilv, rm-surmiran, rm-sutsilv, rm-vallader, rm-puter for Lexemes.Mar 17 2019, 12:01 PM
Lydia_Pintscher triaged this task as Medium priority.
Lydia_Pintscher updated the task description. (Show Details)
Lydia_Pintscher moved this task from Incoming to Ready to estimate on the Wikidata-Campsite board.

Curious, is it possible to estimate by what date this might get implemented? Is there anything I can do to help?

It's currently the second ticket to pick up for the next sprint. The camp (the team taking care of this) is understaffed this and next week so I am not sure when they'll be able to do it exactly but I hope soon.

Change 502224 had a related patch set uploaded (by Michael Große; owner: Michael Große):
[mediawiki/extensions/WikibaseLexeme@master] Add Rumantsch dialects to available Lexeme languages

https://gerrit.wikimedia.org/r/502224

I don't know loads about this but I wanted to ask if this need could be met by our existing support with -x-. E.g. could you use rm-x-Q688873 for rm-rumgr?

I don't know loads about this but I wanted to ask if this need could be met by our existing support with -x-. E.g. could you use rm-x-Q688873 for rm-rumgr?

Could you elaborate on that functionality? It doesn't seem to work out of the box on my development setup:

image.png (581×928 px, 42 KB)

Or is this something that has to be configured for allowed languages?

Hoi,
When we go the way of allowing for languages that are not accepted in any
standard, we could use -x- However, it removes any possibility to question
the validity and the inclusion of anything used by such a code. So I
strongly urge us to refrain from non standard entries.
Thanks,

GerardM

Could you elaborate on that functionality? It doesn't seem to work out of the box on my development setup:

I suspect that it only works on lexeme pages, i.e. you would have to create the lexeme with "rm" first and then edit it.

I don't know loads about this but I wanted to ask if this need could be met by our existing support with -x-. E.g. could you use rm-x-Q688873 for rm-rumgr?

Probably, but I don't think it's a good idea to use private subtags (which have no meaning outside Wikimedia) when there are registered subtags available.

Just to clarify, the codes in this ticket (rm-rumgr etc.) are not made up; they have been standardized by IETF and appear in the IANA language subtag registry.

Yes,
I know and consequently they are fine.
Thanks,

GerardM

Yes,
I know these are fine.
Thanks,

I don't know loads about this but I wanted to ask if this need could be met by our existing support with -x-. E.g. could you use rm-x-Q688873 for rm-rumgr?

Could you elaborate on that functionality? It doesn't seem to work out of the box on my development setup:

image.png (581×928 px, 42 KB)

Or is this something that has to be configured for allowed languages?

rm-x-Q688873is allowed in the spelling variant of a form which is where I was blindly poking. It is also allowed in the spelling variant of the lemma (when editing on a lexeme page but apparently not on Special:NewLexeme). Interesting...

With the patch on gerrit rm-rumgr becomes usable in the same places (form + lemma language code) however it's still not allowed by the gloss language selector.

Maybe that is actually the desired functionality but I'm not clear on it :).

I'm now thinking that maybe rm-rumgr ought to behave like en-gb. E.g. be nicely usable in the gloss language selector as well as accepted in the lemma and form "spelling variants" boxes.

Change 502224 merged by jenkins-bot:
[mediawiki/extensions/WikibaseLexeme@master] Add Rumantsch dialects to available Lexeme languages

https://gerrit.wikimedia.org/r/502224

I just wanted to check we have a common understanding of what adding these languages "to the list of language codes supported for Lexemes." means:
You can see on beta that the merged patch means e.g. rm-surmiran can be used as a code for: lemmas, forms and glosses.

However, it's not added to the list of language hinted by e.g. the gloss language selector widget.
It's also not in order with the languages in the Special:NewLexeme language selector (i.e. just after rm)
It's also therefore given a languuage name in the server rendering of the entity page but not in the client resulting in a change like this as the JS kicks in:

Peek 2019-04-18 10-10.gif (176×429 px, 16 KB)

This might be fine and desired behaviour but we should be aware of the areas where these languages now differ from behaving like all our other languages.

An alternative solution to put rm-surmiran on a par with en-gb or de-ch etc.. might be to look at adding it on https://translatewiki.net/wiki/Translatewiki.net_languages (and optionally then disabling it for labels, descriptions and aliases)

I just wanted to check we have a common understanding of what adding these languages "to the list of language codes supported for Lexemes." means:
You can see on beta that the merged patch means e.g. rm-surmiran can be used as a code for: lemmas, forms and glosses.

However, it's not added to the list of language hinted by e.g. the gloss language selector widget.
It's also not in order with the languages in the Special:NewLexeme language selector (i.e. just after rm)
It's also therefore given a languuage name in the server rendering of the entity page but not in the client resulting in a change like this as the JS kicks in:

Peek 2019-04-18 10-10.gif (176×429 px, 16 KB)

Very sensible remark, thanks.

This is the desired status based on the statement I've got from Product Management.
I am going to write the desired language code situation (which codes applicable to what parts of the system, etc) as some kind of non-really-architectural ADR. Sorry for being slow on this. I hope it will finally see the light next week after the holiday break.

This might be fine and desired behaviour but we should be aware of the areas where these languages now differ from behaving like all our other languages.

An alternative solution to put rm-surmiran on a par with en-gb or de-ch etc.. might be to look at adding it on https://translatewiki.net/wiki/Translatewiki.net_languages (and optionally then disabling it for labels, descriptions and aliases)

This is the desired status based on the statement I've got from Product Management.

👍 Awesome!

Should be live near the end of the month if I read the tag right.

Is something else needed to activate lexemes in variants of Romansh? See screenshot:

Screenshot_20190502-120437.png (1×1 px, 131 KB)

@Sascha Did you try to enter the language code in the field "Spelling variant of the Lemma"?

Ah, got it. Thank you!

Hm, adding usage examples (and probably similar properties) doesn’t seem to work yet. Try adding the sentence “Ils tgauns vivan dalla naschientscha naven ensemen cullas nuorsas.” as usage example (P5831) in language “rm-sursilv” for tgaun (L45642); see screenshot.

image.png (1×1 px, 143 KB)

It's acceptable for Lexemes, but not for monolingual statements. I can add them to the list of monolingual languages, but I would say that languages that are added as acceptable for Lexemes should also be available for monolingual languages. @Lydia_Pintscher: What's your preference? Should I make a patch to add them to monolingual languages?

Filed T222423 for another (very minor) issue that seems related to language variants.

It's acceptable for Lexemes, but not for monolingual statements. I can add them to the list of monolingual languages, but I would say that languages that are added as acceptable for Lexemes should also be available for monolingual languages. @Lydia_Pintscher: What's your preference? Should I make a patch to add them to monolingual languages?

Hi, when discussing the language code topic with @Lydia_Pintscher it was said that language codes added to the list of codes recognized for lemma variants and form representations are NOT supposed to be automatically added to the list of codes used for monolingual text, and labels, descriptions, aliases of items/properties.

Therefore

Should I make a patch to add them to monolingual languages?

Please don't. At least not without going through the regular process for monolingual text language codes first.

Is there anything specific I should do so that people can enter usage examples for Sursilvan lexemes, and likewise for lexemes in the various other Romansh variants? I’ll gladly file more tickets if it helps; just tell me what to do.

Hoi,
There is a big difference between mono-lingual texts and the use in labels
and descriptions. The latter needs support in translatewiki.net as well.
The criteria for that are much more stringent.
Thanks,

GerardM

@GerardM Yes, this request is merely about supporting Wikidata *content* in those language variants, eg. allowing people to enter Sursilvan usage examples for a Sursilvan lexeme (see T222426). No need to translate the *user interface* to Sursilvan, Vallader etc.

Closing again as the discussion continues in T222426