May 6 2019
May 3 2019
The codes are valid (and registered) IETF BCP47 language codes.
@GerardM Yes, this request is merely about supporting Wikidata *content* in those language variants, eg. allowing people to enter Sursilvan usage examples for a Sursilvan lexeme (see T222426). No need to translate the *user interface* to Sursilvan, Vallader etc.
Hm... in the sidebar on Wikimedia Commons (see screenshot), would it perhaps make sense to replace the link to Special:WhatLinksHere by a link to Special:GlobalUsage? Currently, there seems to be a usability/UX issue: the feature is already implemented (thanks for the kind explanation on this bug, I had no idea!). However, people may might never come across the Special:GlobalUsage unless they already know that it exists. Hence the suggestion to remove “What links here” from the sidebar and replace it by “Global usage” which seems to be a superset. (There’s a risk of cluttering the user experience when the sidebar has too many links).
Is there anything specific I should do so that people can enter usage examples for Sursilvan lexemes, and likewise for lexemes in the various other Romansh variants? I’ll gladly file more tickets if it helps; just tell me what to do.
Filed T222423 for another (very minor) issue that seems related to language variants.
Hm, adding usage examples (and probably similar properties) doesn’t seem to work yet. Try adding the sentence “Ils tgauns vivan dalla naschientscha naven ensemen cullas nuorsas.” as usage example (P5831) in language “rm-sursilv” for tgaun (L45642); see screenshot.
May 2 2019
Ah, got it. Thank you!
Is something else needed to activate lexemes in variants of Romansh? See screenshot:
Apr 8 2019
Just to clarify, the codes in this ticket (rm-rumgr etc.) are not made up; they have been standardized by IETF and appear in the IANA language subtag registry.
Apr 2 2019
Apr 1 2019
If nobody else has time to do this, may I volunteer to write the code? Please tell me where to start (which programming language, what framework, etc.)
Mar 25 2019
Curious, is it possible to estimate by what date this might get implemented? Is there anything I can do to help?
Mar 20 2019
Mar 14 2019
Oh, all you need from CLDR is an English label? Nothing else? In that case, this Wikidata query might be helpful:
Sure, but it will take a while until the next official release of CLDR so you'd have to read the CLDR data from the development branch ("trunk"). I do wonder, though, if you could read the IANA registry in addition to CLDR and use IANA as fallback for the English names when CLDR has no data yet. Then, you would immediately get an English name for every language with an ISO 639 or IETF BCP 47 code, so you'd add support for a couple thousand languages at once.
The easiest way to add a new language to CLDR is preparing ‘seed’ files in XML format;
Feb 28 2019
Feb 8 2019
Friendly ping, is there anything I can do to help with this ticket?
Feb 6 2019
Feb 2 2019
Jan 11 2019
@GerardM, is there anything I can do to help with this ticket? There’s a sizable Romansh dictionary whose data can be donated to Wikidata, but this is currently blocked on this ticket. (Try an exact search for a few German words, eg. “Hund” or “Gelbsucht”, to see how the words are different in various variants of the Romansh language).
For languages that have no language code yet, perhaps Lingua Libre could use “mis-x-Q12345” (where Q12345 would be the Wikidata item for the language of the pronunciation audio). That would be a syntactically valid IETF BCP 47 tag, and you wouldn’t lump unrelated languages into the same category. Once the language does get a code, some bot could change the categories of uploaded files on Wikimedia Commons. @GerardM, what do you think?
Sorry, here’s the correct link to the Unicode FAQ about Zawgyi: https://www.unicode.org/faq/myanmar.html
Have you considered using IETF BCP 47 language tags instead of ISO 639-3? Every language with an ISO code also has an IETF code (usually the same, since IETF draws in ISO 639 among others). But other than ISO 639, you can do finer-grained distinctions with IETF tags. That’s why all the internet standards (such as HTTP, HTML, XML) use IETF BCP 47 instead of ISO 639. For example, Brazilian Portuguese, Sursilvan and Zürich German have IETF language tags but no ISO code. If LinguaLibre is asked to support languages without an IETF code, you can request the addition of a language tag.
Jan 8 2019
Agree. The Chakma language is sometimes written in other scripts than the Chakma writing system, such as Bengali or Latin, but this seems to be rare. (In the future, other writing systems will probably get used more rarely than today, because support for the Chakma writing system is getting rolled out to modern computer operating systems only now). In the Unicode CLDR project, we’ve therefore made Cakm the default script for language ccp; see the line <likelySubtag from="ccp" to="ccp_Cakm_BD"/> in likelySubtags.xml. Also, in Unicode CLDR, all Chakma translations are currently kept in the Chakma writing system; we haven’t received any requests to support (in CLDR) the Chakma language ccp in other writing systems than Cakm. Just a data point; not sure if/how this matters for Wikimedia.
Nov 30 2018
Given that the codes should adhere to standards, what is the basis for these codes?
Place names can vary by variant, for example St. Moritz [de] = San Murezzan [rm-rumgr] = Sogn Murezi [rm-sursilv] = San Murezi
[rm-sutsilv] = Son Murezzi [rm-surmiran]. But that’s a multilingual label, not a monolingual text statement.
Nov 28 2018
@Nikki, do you know how where/how I should request adding these BCP47 tags so they become available for Wikidata lexemes? I’ve a large dictionary in multiple Romansh variants, which I’d like to import to Wikidata lexemes, so this isn’t just an academic request.
Nov 24 2018
Nov 23 2018
For illustration, here’s the English word ‘dog’ in various variants of Romansh:
- rm-rumgr: chaun
- rm-sursilv: tgaun
- rm-sutsilv: tgàn
- rm-surmiran: tgang
- rm-puter: chaun
- rm-vallader: chan