Page MenuHomePhabricator

Error when creating lexemes in languages which have an unsupported ISO 639-1 code
Open, LowPublic

Description

Not all languages with ISO 639-1 codes are enabled for labels. Southern Ndebele (nr) and Luba-Katanga (lu) are a couple of examples of such languages. It is not possible to add lexemes in these languages.

To reproduce:

Expected result: Either the lexeme is created with the right language code, or the language code field is shown so that users can pick a different language code.
Actual result: The error "There are problems with some of your input." is shown, with no indication of what's wrong.

Event Timeline

If you fudge the CSS a bit, you can see the error:

Screenshot_2018-11-12 Create a new lexeme - Wikidata.png (380×700 px, 20 KB)

I figure we have two options here:

  • Insist on only allowing known language codes, and change the form so that the condition for showing the “spelling variant” input is not just whether or not the spelling variant can be auto-inferred from the item, but also whether that language code is recognized by WikibaseLexemes.
  • Allow all language codes which occur as Property: ISO 639-1 code statements, even if they’re not known to WikibaseLexeme.

We could also add nr and lu as known languages now, but that just means someone else will eventually encounter this bug with other language codes. (That’s not necessarily an argument against adding nr and lu, it just means that we should additionally implement one of the above options, in my opinion.)

Actually, even if you manually modify the form to set the language code to mis-x-nr, apparently WikibaseLexeme just overrides that with the language code from the item. I guess we’d have to adjust that too, if we go for the first option.

I skimmed over the list of ISO 639-1 codes and I think the only others which aren't supported are Ojibwe (oj) and Avestan (ae).

I think the first option would be best because people do occasionally add non-existent ISO 639-1 codes to items.

Yeah, I’m also leaning towards the first option (otherwise: what happens if a vandal adds an ISO 639-1 code to a random item, creates a lexeme with that language item and code, and then the item is rolled back? is the lexeme now in an invalid state?), but I figure @Lydia_Pintscher should weigh in on this.

Yeah let's go for option 1 (enforcing that the language code is also recognized by WikibaseLexeme).

It's also not possible to add lexemes in any language with an "ISO 639-1 code: novalue" statement (there are currently 10), because they too have ISO 639-1 code statements which don't correspond to anything MediaWiki knows about.