Page MenuHomePhabricator

Make Serbian (sr-el) language available for terms (labels/descriptions/aliases)
Closed, DeclinedPublic8 Estimated Story Points

Description

As a Wikidata editor, I want Serbian (sr-el) to appear under my languages in the termbox when I add sr-el to my babel template, so that I can add labels and descriptions.

See Wikidata:Contact the development team#sr-el

Problem:
sr-el is not properly picked up by the termbox as a preferred language if it is on the user's user page.

BDD
GIVEN a user page with a babel template listing sr-el as a spoken language by the user
WHEN viewing an Item page
THEN sr-el is available for viewing and editing in the termbox as a preferred language

Acceptance criteria:

  • users with sr-el in their babel box can edit terms in sr-el as a preferred language

Notes:

Event Timeline

Note that we already use sr-ec and sr-el for terms (see language stats). The problem here is not that the codes are missing, it's that adding those codes to your Babel box does not cause them to appear in the term box.

I notice that after adding sr-ec and sr-el to the Babel box, the Babel box shows sr-Cyrl and sr-Latn and puts the user into categories for sr-Cyrl and sr-Latn. I suspect what's happening is that the codes get turned into sr-cyrl and sr-latn at some point before being passed to the term box and, since we don't support sr-cyrl or sr-latn, the term box fails to find a matching language and can't display them.

The opposite happens when loading a page using ?uselang=, both ?uselang=sr-el and ?uselang=sr-latn display the sr-el terms in the first row.

Thanks for the investigation, @Nikki! Yeah, I was confused when I saw this task because they *should* be available already. So if you have any idea how to fix it @Lydia_Pintscher, a fix would be appreciated!

Task Inspection notes:

Install Babel and add sr-el.
What does UserPreferredContentLanguagesLookup getLanguages() return? Does it have sr-el?
If not, look at places that pass data into UserPreferredContentLanguagesLookup. Look at the filtering or getAllUserLanguages.
If yes, look at places that consume that response.

Check if the language appears with JS disabled (differences between the lists of languages in the ui and on the server).

Change 673532 had a related patch set uploaded (by Tonina Zhelyazkova; owner: Tonina Zhelyazkova):
[mediawiki/core@master] Add sr-latn to Names of languages

https://gerrit.wikimedia.org/r/673532

Some findings:

As @Nikki commented, Termbox receives the code sr-latn, but Wikibase does not recognize it as a content language, therefor it's not visible in the termbox.
Wikibase uses MediaWiki's LanguageNameUtils to get a list of content languages. And from what I can tell the utils read from Names.php which is a list of language codes and their names. That list currently does not contain sr-latn but rather sr-el (see https://github.com/wikimedia/mediawiki/blob/master/languages/data/Names.php#L422). There are some mapping mechanisms which turn sr-el into sr-latn but that is of no use to us since we send sr-latn to the language util.

The only fix I could find (see the above gerrit change) is to add sr-latn to this Names.php list. I'm not sure that's the right place to solve the problem so I'll wait for a review from some WMF language people.

Some findings:

As @Nikki commented, Termbox receives the code sr-latn, but Wikibase does not recognize it as a content language, therefor it's not visible in the termbox.
Wikibase uses MediaWiki's LanguageNameUtils to get a list of content languages. And from what I can tell the utils read from Names.php which is a list of language codes and their names. That list currently does not contain sr-latn but rather sr-el (see https://github.com/wikimedia/mediawiki/blob/master/languages/data/Names.php#L422). There are some mapping mechanisms which turn sr-el into sr-latn but that is of no use to us since we send sr-latn to the language util.

The only fix I could find (see the above gerrit change) is to add sr-latn to this Names.php list. I'm not sure that's the right place to solve the problem so I'll wait for a review from some WMF language people.

That sounds like it will add sr-latn alongside sr-el, i.e. without migrating the sr-el data to sr-latn, without preventing people from using sr-el instead of sr-latn and without sr-el in a user's Babel box causing the sr-el data to be shown in the termbox. If so, the problem would still exist - we would still have/allow data stored using a language code which users can't add to the termbox.

Change 673532 abandoned by Tonina Zhelyazkova:

[mediawiki/core@master] Add sr-latn to Names of languages

Reason:

https://gerrit.wikimedia.org/r/673532

We attempted to solve this issue but will unfortunately have to give up on our side. There are a number of rather costly workarounds that we could do in Wikidata where the effort does not seem justified. Therefore I'm declining this for now.

Some background:
There seems to be a non-technical issue that is a root cause of the problems surfacing on "technical" level. Namely, there are two pair of codes used to identify two Serbian variants/spellings using Latin and Cyrllic script. Respectively sr-el/sr-en and sr-latn/sr-cyrl. The former seems like the Wikimedia/Wikipedia custom code pair that Wikidata inherited. The latter pair is more standard codes - BCP 47 compliant.
It seems that MediaWiki and surrounding , on which Wikibase relies heavily as a source defining what are "allowed"/"correct" language code, is seemingly intending to do some shift from the non-standard sr-el/sr-ec codes to sr-latn/sr-cyrl (T117845). This migration is not finished, which leads to some components (e.g. Babel) favoring sr-latn/sr-cyrl, but others - e.g. Mediawiki's central language code list - still only recognizing sr-el/sr-ec. So there needs to be a change there and then we need to adapt to that change in Wikibase, which again would come with a non-trivial cost of migrating existing data in labels etc.