Page MenuHomePhabricator

Add language codes for isiNdebele
Open, Needs TriagePublic

Description

This is a spin-off ticket from T289776, requesting proper language support in Wikidata for the following languages specifically:

  • nso - Sepedi / Sesotho sa Leboa / Pedi / Northern Sotho (already available)
  • nr - isiNdebele / Southern Ndebele / Ndebele of South Africa, which is one of the official languages of South Africa
  • nd - Northern Ndebele / isiNdebele / Ndebele, spoken in Zimbabwe

Wikidata items:

They currently have different levels of support in Wikidata, so a patch likely will be more than adding the ISO codes at one place only. Specifically:

  • Sepedi (nso) lexemes can be added through a counterintuitive workaround by abusing the ’spelling variant of the Lemma’ box and typing nso.
  • There’s no such option for any of the isiNdebele ISO codes.

They show up in the “Lexeme’s language” dropdown list by various names, which I all tried just in case it might influence the system’s behaviour.

Screencasts (from the old version of Special:NewLexeme):

Event Timeline

Change 828887 had a related patch set uploaded (by Ariel Gutman; author: L10n-bot):

[mediawiki/extensions/WikibaseLexeme@master] Add language codes of Ndebele (Northern/Southern)

https://gerrit.wikimedia.org/r/828887

Keet10 renamed this task from Add language codes dor Sepedi and isiNdebele to Add language codes for Sepedi and isiNdebele.Sep 7 2022, 3:13 PM
AGutman subscribed.

I'm taking care of the Ndebele language codes (nd & nr) in https://gerrit.wikimedia.org/r/828887.

As for the nso code it seems it is already supported (see https://www.wikidata.org/wiki/Lexeme:L690524)

nso isn't fully supported; see second part of the description above. Specifically, as shown in the screencast: you get error messages and have to fill in the language code in the variant spelling of lemma box (noting that language code is not variant spelling of a lemma). For the the properly/fully supported ones, you just select the language from the drop-down list in the language box and create immediately. Hence, my suspicion that there's more than one place where language support is dealt with and consequently at least two places to fix it.
Of course, a workaround for isiNdebele like the nso one will be better than the current state of none/miscellaneous.

T209282 and T284882 are the reasons that nso must be specified manually; it is not about any lack of support within WikibaseLexeme or elsewhere.

Can we go forward with nd & nr codes?

Not without LangCom approval, which hasn’t happened yet as far as I can tell.

IMHO this task should at least specify whether the language codes are intended to be added for lexemes, monolingual languages, or item labels/descriptions/aliases.

Ok, who is responsible for this approval? Could we ping them?

Currently we would like to add lexemes in these languages, but I suppose all use cases should ultimately be supported.

I’m not sure if LangCom watch the Phabricator board or should be pinged (I usually only see these tasks when they have approval ^^) maybe @Amire80 or @jhsoby can take a look at this task?

On Sepedi / Northern Sotho

There's probably nothing to do about Sepedi. It can be added if you select "Northern Sotho". There's a debate about what the name of the language should be, and it's documented with pretty good references on the English Wikipedia article: https://en.wikipedia.org/wiki/Northern_Sotho_language . (Disclaimer: I added some of these references myself a few years ago. I hope it's clear from my edits that I don't have a strong opinion either way. From what I heard till now, there are good arguments for both names, and no clear "winner". But I'm not actually claiming to be an expert.)

If anyone has good arguments for changing the English-language label of item https://www.wikidata.org/wiki/Q33890 or the title of the English Wikipedia article https://en.wikipedia.org/wiki/Northern_Sotho_language from "Northern Sotho" to "Sepedi", this should be discussed on the respective talk pages in Wikidata and in the English Wikipedia.

If anyone thinks that it should be possible to add lexemes separately in both Northern Sotho and in Sepedi, then "nso" should probably be used for Northern Sotho, and a code with a subtag for Sepedi. However, I suspect that this is not what's being asked here.

Northern & Southern Ndebele are fine to add, of course. Nothing to be done at this level about Northern Sotho – like @Lucas_Werkmeister_WMDE pointed out in Telegram, the problem with input there is T284882, so the problem is not that the language code is missing.

I am actually surprised that nd and nr aren't already in Names.php – I thought all ISO 639-1 codes (two-letter codes) were there, but that was apparently an incorrect assumption. @Amire80 Would it be problematic to add them there even if they haven't completed the necessary localization?

Change 828887 merged by jenkins-bot:

[mediawiki/extensions/WikibaseLexeme@master] Add language codes of Ndebele (Northern/Southern)

https://gerrit.wikimedia.org/r/828887

I am actually surprised that nd and nr aren't already in Names.php – I thought all ISO 639-1 codes (two-letter codes) were there, but that was apparently an incorrect assumption.

No, not all two-letter codes are there. In practice, these two languages are among the very few that are still not there, and that's one of the reasons why I'd really love to have translators into them.

@Amire80 Would it be problematic to add them there even if they haven't completed the necessary localization?

Yes, it's problematic, for two reasons:

  1. Names.php is for languages in which there is some localization.
  2. I don't have reliable information about the autonyms.

In language-data (not Names.php) we already have the following:

nd: [Latn, [AF], siNdebele saseNyakatho]
nr: [Latn, [AF], isiNdebele seSewula]

I added these two lines years ago based on the English Wikipedia. I wouldn't do it today because I'm not sure that they are correct: the English Wikipedia didn't have reliable sources for these autonyms then, and one of them (nd) changed to "isiNdebele saseNyakatho", and it still doesn't cite a source. So that's the problem with reliability, but they also have to be distinct. "Ndebele" by itself is probably written as "isiNdebele" in each of them, so another word is needed for both to distinguish them in a list of languages. A reliable source would be dictionaries or grammar books, for each language. (Also, the country name is sometimes added in parentheses, but I really dislike this solution. Country names change surprisingly often.)

Not having autonyms is not a blocker for adding them to Wikidata, but if anyone seriously wants to add lexemes in these languages, then I'd also like to get reliable information about autonyms from that p person :)

What does "proper support" mean here? Do you want to be able to use them for labels? For monolingual text? Only lexemes?

Nikki renamed this task from Add language codes for Sepedi and isiNdebele to Add language codes for isiNdebele.Feb 3 2023, 2:12 AM
Nikki updated the task description. (Show Details)

I've updated the description to make this ticket more clearly about nd and nr, because (as others already pointed out) nso is already supported and the unintuitive behaviour of the interface when selecting it is already covered by another ticket.

Current status:

There hasn't been a response to T317193#8360607 so it's still not clear what exactly is wanted.

nr and nd can now be used for lexemes.

  • If you select "Northern Ndebele" or "Southern Ndebele" it should work fine.
  • If you select "Ndebele", you will need need to type "nd" or "nr" in the spelling variant field because the language names have not been added.
  • The ticket for adding the language names is T322138.

nr can already be used for monolingual text (see T155430). nd cannot be used for monolingual text.

Neither nr nor nd can be used for labels. Neither of them has an interface translation.