Page MenuHomePhabricator

Some MachineVision label suggested are shown in an odd foreign language
Closed, ResolvedPublicBUG REPORT

Description

I've noticed that when using Suggested tags, that for most images at least one of the suggested labels is in Dutch instead of English.

I have Dutch configured as one if the language I speak and can provide translations for in the Commons structure data captions and on Wikidata.org. However my interface language is set to "British English" and the labels in question all have English labels for them (Dutch is not used as a fallback here).

Looks like there might be a bug here where it's mixing languages or choosing translations in the wrong order. Because Dutch and German words very often have a special meaning in English, this is rather confusing.

Example:

Event Timeline

Krinkle created this task.Jan 9 2020, 11:56 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJan 9 2020, 11:56 PM
Ramsey-WMF assigned this task to AnneT.Jan 10 2020, 6:57 PM
Ramsey-WMF changed the subtype of this task from "Task" to "Bug Report".
Ramsey-WMF added a subscriber: Ramsey-WMF.

After looking into this, it appears that the problem is that English isn't being used as the fallback for British English, or it is being used but not before the user's other set languages. I looked through the Wikidata entities for many of the items in the screenshots that were in Dutch (platteland [aka "rural area", Q175185] and hoogvlatke [aka "highland", Q878223]), and they have English labels but no British English labels.

Assigning to Anne for investigation 🔍

AnneT added a comment.Jan 13 2020, 9:45 PM

Hey @Ramsey-WMF, we're using Wikibase's LanguageFallbackChainFactory method to get a language fallback chain for the user, and this generates a chain based on the user's interface language followed by the languages in their Babel template in order of skill level. So, in Timo's case, the chain would be British English, Dutch, English, German, and Western Frisian, which is why we're seeing Dutch words before English words for wikidata items without a British English translation.

I think there are 3 potential paths forward here:

  1. Maintain the status quo because this is how language fallback chains in Wikibase are intended to work
  2. Use Mediawiki's Language::getFallbacksFor() method instead, which uses the fallbacks listed for each language and doesn't account for a user's Babel template
  3. A combination of the two: getting a list including the user's interface language and Babel languages from Wikibase, then adding in per-language fallbacks (so we'd get the chain British English, English, Dutch, etc...)

There may be some version of option 3 that exists in Wikibase code already but I haven't found it. I'd be interested to know more about WMDE's rationale for generating the fallback chain this way.

Any thoughts as to what our users would prefer?

Pinging @Addshore for potential insight on whether there are existing elements in the Wikibase code that will enable option #3 above. If not, things could get a little complicated.

If there's an easy way for us to use getFallbacksFor() only on CAT, I'd be interested in exploring option #2 since CAT is a bit special, and primarily targets a different user base with different expectations. We can keep the usual Wikibase rules for caption input/display and displaying entity labels.

Addshore added a comment.EditedJan 17 2020, 4:28 PM

It looks like LanguageFallbackChainFactory ::newFromLanguageCode has a mode parameter.
Looking further down into buildFromLanguage which uses this mode I see a call to:

$fallbacks = call_user_func( $this->getLanguageFallbacksFor, $languageCode );

where:

private $getLanguageFallbacksFor = 'Language::getFallbacksFor';

So, it looks like if you pass the FALLBACK_OTHERS mode in you should get a fallback chain that includes what you desire.

It looks like the buildFromBabel method also uses this FALLBACK_OTHERS thing on the babel languages.

I think if you have a play around with those moving parts you should be able to get it to do what you want!

Restricted Application added a project: Structured-Data-Backlog. · View Herald TranscriptJan 17 2020, 4:28 PM

Change 566359 had a related patch set uploaded (by Anne Tomasevich; owner: Anne Tomasevich):
[mediawiki/extensions/Wikibase@master] Adjust language fallback chain order

https://gerrit.wikimedia.org/r/566359

@AnneT @Addshore You might know this already but wanted to give one piece of additional context - The fallback order used by MachineVision's label rendering seems to be the same as the one that Wikidata uses to decide which translations to let me fill in, e.g. when editing an item on Wikidata.org.

A key factor here is that the order Wikidata.org uses there is not a fallback order from which 1 is chosen to render something in. Rather, it is used to decide which (multiple) input fields to show me, some of which may have values already, some of which may not yet have values.

It makes perfect sense for me to see inputs for en-gb, nl, de (my "preferred" input languages) and maybe "en" after that. It wouldn't be too bad if "en" appeared earlier in that list (given that en-gb is very rarely different and very rarely worth adding an override for), but it gets much more significant when talking about fallback languages where the differences are more significant (e.g. zh-standard to zh-simplified, or nrm Narum to fr French), where the fallback is there not due to similarity but due to likelihood of having familiarity in the populations that speak it (in the same way that "en" is a fallback for most).

However, it makes much less sense for a person with interface language set to "en-gb" to ever see a message in a language that isn't directly in the chain of "en-gb". We never mix language chains in MediaWiki and also do not support more than 1 logical interface language, so it would likely cause many bugs if we start to output messages in languages outside the primary interface language chain (eg. html attributes would be more wrong). Just because I happen to know Japanese and want to review some translations once a week on Wikidata.org, doesn't mean that when browsing Commons I should see parts of the software interface itself casually pop up in Japanese.

I bring this up because it looks like the code being modified in the above patch might be okay (or better) as-is. I suspect it is rather that MachineVision might be using a method that isn't intended for the purpose of selecting 1 language for display purpose. Rather the method is intended for showing multiple options, as part of the input and review process.

AnneT added a comment.Jan 22 2020, 8:07 PM

Thanks for the context @Krinkle, that makes sense. @Addshore: if you agree, I'll close that patch and update the method used in MV.

So as far as I am aware the intention in Wikibase is to always use this primarily babel provided language fallback chain for rendering labels of entities.
For for example I have:

If I remove my babel then I fallback to another fallback for data entry

Switching my fallback to de while still in an en interface will leave me with:

@Lydia_Pintscher might be able to clarify some parts of expected and desired behaviour, but this isn't something we have looked at in a long old while or touched since 2013 really.

AnneT moved this task from Backlog to Under discussion on the MachineVision board.Mar 4 2020, 2:47 PM
AnneT added a comment.Apr 6 2020, 5:28 PM

Based on Timo's comment above, we're going to use the natural language fallback chain rather than the Babel chain for labels. We're already removing suggestions whose labels don't have a value in the user's language(s), and the "add custom tag" tool gives users an action when few or no suggestions are presented.

Change 566359 abandoned by Anne Tomasevich:
Adjust language fallback chain order

https://gerrit.wikimedia.org/r/566359

Change 587579 had a related patch set uploaded (by Anne Tomasevich; owner: Anne Tomasevich):
[mediawiki/extensions/MachineVision@master] Only use natural language fallback chain

https://gerrit.wikimedia.org/r/587579

Change 587579 merged by jenkins-bot:
[mediawiki/extensions/MachineVision@master] Only use natural language fallback chain

https://gerrit.wikimedia.org/r/587579

@Krinkle changes for this should be available on production. Is it working better for you now?

Yes, I've not seen mixed languages for a few days now. Thanks.

Ramsey-WMF closed this task as Resolved.May 4 2020, 8:39 PM

Seems to be all done 👍🏼