Page MenuHomePhabricator

getFrequentLanguageList: expected behavior for 'redirected' languages
Closed, ResolvedPublic

Description

The ULS method getFrequentLanguageList() leverages the getAutonym() method during the process to "make flat, make unique, and ignore unknown/unsupported languages" the result it returns - presumably trusting that getAutonym() returns the language code itself if the language "is unknown/unsupported".

Codes of 'redirected' languages, e.g. "fil" redirecting to 'tl', yield the autonym of the language it is redirected to when the getAutonym() method is called.

Consequently codes for redirected languages are not groomed by getFrequentLanguageList() - possibly resulting in return values containing e.g. both 'fil' and 'tl'.

Is this expected behavior?

This was found as part of the research for T217770

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMay 8 2019, 12:05 PM

@Nikerabbit Uncertain what is the correct process on getting clarification regarding ULS topics I'm taking the liberty to bring this ticket to your attention. Could you shed some light onto this please, or kindly point me to the right person to ask?

It's tricky. On one hand we want to preserve the original language codes to not mess up expectations. On the other hand the current behavior is not wanted either.

I don't know the answer right now. I'll return hopefully next week with more thoughts.

@Nikerabbit Sorry to poke you as an individual again (please point me to the process, if possible): is there any update about this? Please mind that this ticket is not a change request but an inquiry about the expected behavior. In theory a boolean answer would suffice for us to decide if we have to compensate for this output or if it can/will be changed upstream eventually.

Some archeology in ULS code bases has lead us to https://phabricator.wikimedia.org/T51847 and the related code change https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/UniversalLanguageSelector/+/69613/.
Based on our interpretation of the said bug report, and in particular the behaviour of the new code in https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/UniversalLanguageSelector/+/69613/7/resources/js/ext.uls.init.js@123 (implementation details later changed, but behaviour seem to not have changed), we believe to have established that getFrequentLanguageList should include fil language code, in case it is provided by any of numerous data sources considered by the method.
In cases where both fil and tl are provided by the sources of language codes, both fil and tl should be included in the result of getFrequentLanguageList
It seems then, that filtering out language codes which are redirects should be the responsibility of the code using getFrequentLanguageList.

@Nikerabbit and other exerts from UniversalLanguageSelector team could you please confirm the above description is not off?

BTW @Arrbee, in this case we've been asking @Nikerabbit personally, which, admittedly, is rather hostile approach. What would be your team's preferred way to submit questions like this one. I've failed to find this piece of information on your team's wiki page: https://www.mediawiki.org/wiki/Wikimedia_Language_engineering.
Again, I'd like to underline that we have not claimed there is a bug. It has not been clear what is the expected behaviour, and the related documentation, including the tests documenting the code have not been providing the clear answer.

BTW @Arrbee, in this case we've been asking @Nikerabbit personally, which, admittedly, is rather hostile approach. What would be your team's preferred way to submit questions like this one. I've failed to find this piece of information on your team's wiki page: https://www.mediawiki.org/wiki/Wikimedia_Language_engineering.

Tagging the Language team along with the project code is good start to have our attention. However, depending on our work schedule we will be able to suggest when we can help. From what I have read so far, this query came up as part of your research. Can you let us know what is the urgency associated with this request?

Again, I'd like to underline that we have not claimed there is a bug. It has not been clear what is the expected behaviour, and the related documentation, including the tests documenting the code have not been providing the clear answer.

@Pginer-WMF @Amire80 - could you please check if we can help here in any way?

Much thanks for coming back to us @Arrbee! Apologies for the late response from my end.

Tagging the Language team along with the project code is good start to have our attention. However, depending on our work schedule we will be able to suggest when we can help.

Thanks, we will exercise this next time we come across questions in the domain of your team.

From what I have read so far, this query came up as part of your research. Can you let us know what is the urgency associated with this request?

Given the discovery made by investigating the code and its history we've mentioned in one of above comment, the request here is of medium urgency.
At our current work at WMDE we are no longer blocked on this question. It does still seem useful for current and future ULS users to have more clarity on the designed behaviour of the said method, so you having a look into it would be undoubtedly appreciated.

WMDE-leszek added a subscriber: santhosh.EditedJul 23 2019, 8:03 AM

Dear UniversalLanguageSelector folks, it's me again.
It looks that with https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/UniversalLanguageSelector/+/523176 being merged it looks we might have got an answer on the question?

Is it then correct to say that getFrequentLanguageList output does NOT include language codes of "redirected" languages?
I dare to ping @santhosh on this as well, whom we haven't asked before, while his expertise seems relevant here. Thanks!

I think given my patch, it's actually should be resolved now.

I think given my patch, it's actually should be resolved now.

After your patch, contributing in Serbian (and possibly other languages with variants) is impossible, or at least I don't know how. See T217770#5436563 for more details.

WMDE-leszek added a comment.EditedMon, Aug 26, 11:57 AM

Apologies for breaking the usability of Wikidata for the Serbian audience, and thanks for reporting the issue.
The change to the ULS has now reverted, which should fix the problem for the Serbian language "variants".

@Petar.petkovic in the light of what we've learned with https://gerrit.wikimedia.org/r/523176 is it correct to conclude that getFrequentLanguageList should be also returning language code that is has been redirected. i.e. both "fil" and "tl", or both "sr" and "sr-cyrl"?

If we got the answer to the question, I guess we could close this task as resolved? We'll await confirmation from your side, and close it.

Adding new labels for Serbian is working again as it used to.

@Petar.petkovic in the light of what we've learned with https://gerrit.wikimedia.org/r/523176 is it correct to conclude that getFrequentLanguageList should be also returning language code that is has been redirected. i.e. both "fil" and "tl", or both "sr" and "sr-cyrl"?

When I said "like it used to" above, I meant that one needs to have sr language code in ULS previous languages in order to edit items for Serbian on Wikidata. sr-cyrl is not useful for editing, but is acceptable if we return both sr and sr-cyrl. Users need to know that they can only edit in sr and sr-cyrl will give them error and this makes a bad user experience.
I guess the similar case would be for fil and tl, where users can only edit in tl, but I'm not sure how Wikidata will handle these language codes.

Wikibase code should have language code validation system in place so that we don't end up with uneditable languages in UI. Even worse, we saw on example with Serbian that editing in some language could be completely broken for all users. That is what T217770 should deal with.

Adding new labels for Serbian is working again as it used to.

Great, thanks for confirming!

Wikibase code should have language code validation system in place so that we don't end up with uneditable languages in UI. Even worse, we saw on example with Serbian that editing in some language could be completely broken for all users. That is what T217770 should deal with.

Indeed, that's the approach we've intended to take. We simply have thought that, aside from improving the situation on the Wikibase side, we might be able to contribute to ULS by fixing some not wanted behaviour. Now it is clear that the behaviour was not meant to be fixed, as it was not wrong. Thanks and apologies for the inconveniences once again.

WMDE-leszek closed this task as Resolved.Tue, Aug 27, 6:55 AM
WMDE-leszek claimed this task.
WMDE-leszek removed WMDE-leszek as the assignee of this task.