Page MenuHomePhabricator

Serbian language does not appear for people from Serbia in Wikidata item pages
Open, HighPublic

Description

When I go, not logged in, to a random item in Wikidata, for example https://www.wikidata.org/wiki/Q18679242, and click on the arrow next to "In more languages", I get the following list of languages: English, Albanian, Hungarian, Romanian.

Since Albanian, Hungarian and Romanian are the largest minority languages in Serbia, I assume that this list of languages is derived from geolocating my IP to Serbia. However, the official and majority language of Serbia, Serbian language, is not in the list, and I expect it to be there.

When I log in, the languages from my userbox are correctly shown.

This bug to me seems similar to T50804 or T100002.

Event Timeline

Nikola_Smolenski raised the priority of this task from to Needs Triage.
Nikola_Smolenski updated the task description. (Show Details)
Nikola_Smolenski added a project: Wikidata.
Nikola_Smolenski added a subscriber: Nikola_Smolenski.
Restricted Application added subscribers: StudiesWorld, Aklapper. · View Herald TranscriptDec 17 2015, 7:29 AM

Similarly, when I go to https://en.wikipedia.org/wiki/Main_Page and click on the gear icon next to "Languages", I get the "Language settings" window; now when I click on "Input" and "Enable input tools", the list of languages I get under "Language used for writing" is "English, shqip, magyar". This is probably the same underlying issue.

Lydia_Pintscher added a subscriber: Lydia_Pintscher.

Wikidata just takes the top 3 suggestions from ULS. Fixes need to be made there.

So perhaps the bug is that sr_Latn and sr_Cyrl codes are not reduced to just sr?

This could be a case where Wikidata might need to have different behavior from ULS. ULS should offer both Cyrillic and Latin input tools, but Wikidata might not want that.

@Lydia_Pintscher Are you referring to the ULS getFrequentLanguages or getPreviousLanguages method?

We are using getFrequentLanguageList.

OK, now Serbian appears, but with Serbian Cyrillic and Serbian Latin as separate languages. This means that all the Serbian content entered so far, which was entered under the code 'sr', will not be available. Has there ever been a discussion to use two separate codes for Serbian language? Was there a final decision? If yes, what could be done about old content?

cscott added a subscriber: cscott.Aug 10 2017, 7:01 PM
Restricted Application added a subscriber: PokestarFan. · View Herald TranscriptAug 10 2017, 7:01 PM

It seems like Wikidata doesn't fully support wikis with LanguageConverter enabled? Does this same issue occur for zh-hans and zh-hant (and zh-hk, etc)?

thiemowmde triaged this task as Low priority.Jan 31 2018, 7:26 PM
thiemowmde added a project: patch-welcome.
Nikola_Smolenski raised the priority of this task from Low to Medium.Jan 31 2018, 7:50 PM
Nikola_Smolenski removed a project: patch-welcome.

I do not understand how could this task possibly be low priority. This potentially affects every language that has variants.

Liuxinyu970226 lowered the priority of this task from Medium to Low.Jan 31 2018, 11:29 PM
Liuxinyu970226 added a subscriber: Liuxinyu970226.

@Nikola_Smolenski: As priority reflects reality and does not cause it, do you plan to fix this problem or have Wikidata and/or ULS developers confirmed that this task indeed is more urgent? Please do not change priority if it does not confirm with Setting Task Priorities. Resources of teams are limited when it comes to working on requests. We want to be realistic about communicating what is being worked on, to maximize the impact of changes. Practically, this often unfortunately means assigning a low priority to many tasks.

If the priority was increased because you plan to work on this task please 1.claim the task by setting yourself as assignee, and 2. submit a Gerrit patch, both are required if you want to raise again. Thank you for your help!

If you do not plan to work on this task yourself but feel that this task is urgent but being ignored by those with the actual power to put the task on their agenda, please discuss with the responsible developers, product managers and budget holders. Further contact information can be found on the corresponding team wiki page. Thanks for your understanding!

I think that this is fixed now, possibly thanks to https://github.com/wikimedia/jquery.uls/pull/275 .

I tested by setting the countryCode to RS in the debugger.

@Nikola_Smolenski, @Petar.petkovic, can you please test?

Thanks!

Amire80 raised the priority of this task from Low to High.Feb 27 2018, 3:41 PM
Amire80 moved this task from Backlog to Missing languages on the ULS-CompactLinks board.

@Nikola_Smolenski, @Petar.petkovic, can you please test?

I was following this discussion for some time now. I regularly use Wikidata on Chrome and always got following languages (for at least last year):

  • English
  • Serbian
  • Croatian
  • Bosnian

Here is how it looks:


All of above languages are in my Chrome's list of accepted languages although I've never explicitly added any. Probably coming from those "Don't translate Bosnian" (for example) rules that Chrome shows. I still haven't checked the codebase to see where the list of languages is coming from.

All this time I wasn't aware there is actually big issue with Serbian. Only realized now while trying on FF (which I use regularly, but not for wiki editing). When some Wikidata item is opened, I get:

  • English
  • српски (autonym name for Serbian in Cyrillic script)
  • srpski (autonym name for Serbian in Latin script)
  • Albanian

Expanding list of languages and searching for "Serbian" shows following items as well:

  • Serbian
  • Serbian (Cyrillic script)
  • Serbian (Latin script)
Short list of languages, initially shownExpanded list of language

That is actually five entries for Serbian in total, one general and two for Cyrillic and Latin each. I haven't even tried editing those entries with autonym names. Autonym name entries seem to be coming from some external (from Wikidata POV) service and opening list of all languages shows entries from Wikidata, which are populated and make sense.

Would be good to know what language codes are those five things treated internally.

I think that this is fixed now, possibly thanks to https://github.com/wikimedia/jquery.uls/pull/275 .

I tested by setting the countryCode to RS in the debugger.

@Nikola_Smolenski, @Petar.petkovic, can you please test?

Thanks!

Mega-facepalm: I tested this without reading the original task description. I assumed that this is about ULS, but it's about Wikidata item pages. Apologies! 🤦

So I suspect that the issue is with Wikidata and not with ULS. But generally, language codes are messy and perhaps using a more common repo of languages and their codes would be good.

Amire80 renamed this task from Serbian language does not appear for people from Serbia to Serbian language does not appear for people from Serbia in Wikidata item pages.Mar 24 2018, 7:48 PM

There was a workaround in the past to be able to edit in Serbian. If you had sr in the list of your previous ULS languages, you would see it without variants, and be able to edit.
After 76551ed4a7fccbaf87cd850674406ae316f4f956, the workaround is no longer possible. It might be the case that no new additions are now possible in Serbian.

Every Wikidata item looks like this for me, both as anon and as logged user.


T217770#5436563 is more detailed comment about the problem.

Change 532276 had a related patch set uploaded (by Ladsgroup; owner: Ladsgroup):
[mediawiki/extensions/UniversalLanguageSelector@master] Revert "Return target of redirect languages in mw.uls.getFrequentLanguageList"

https://gerrit.wikimedia.org/r/532276

Change 532276 merged by jenkins-bot:
[mediawiki/extensions/UniversalLanguageSelector@master] Revert "Return target of redirect languages in mw.uls.getFrequentLanguageList"

https://gerrit.wikimedia.org/r/532276

Change 532341 had a related patch set uploaded (by Alaa Sarhan; owner: Ladsgroup):
[mediawiki/extensions/UniversalLanguageSelector@wmf/1.34.0-wmf.19] Revert "Return target of redirect languages in mw.uls.getFrequentLanguageList"

https://gerrit.wikimedia.org/r/532341

Would be good to know what language codes are those five things treated internally.

To reiterate: does Wikibase document somewhere what kind of language codes it expects to receive and use for languages and language variants? This bug is a failure of normalisation somewhere.

Change 532341 merged by jenkins-bot:
[mediawiki/extensions/UniversalLanguageSelector@wmf/1.34.0-wmf.19] Revert "Return target of redirect languages in mw.uls.getFrequentLanguageList"

https://gerrit.wikimedia.org/r/532341

Mentioned in SAL (#wikimedia-operations) [2019-08-26T11:34:25Z] <ladsgroup@deploy1001> Synchronized php-1.34.0-wmf.19/extensions/UniversalLanguageSelector: SWAT: [[gerrit:532341|Revert "Return target of redirect languages in mw.uls.getFrequentLanguageList" (T217770 T121747)]] (duration: 00m 46s)