Page MenuHomePhabricator

Babel language fallbacks do not distinguish between explicit and implicit language fallbacks
Closed, ResolvedPublic

Description

As a Wikidata editor, I want to see labels in languages that are relevant to me, according to my Babel information.

Problem:
I noticed this while working on T297393: LanguageFallbackChainFactory::buildFromBabel() first adds the babel languages, then their variants, and then for each language all of the fallbacks. This means that the implicit fallback to en is added after all the explicit fallbacks of the first Babel language, but before the explicit fallbacks of any other Babel languages.

Example:
Consider a user with the following Babel. arz-5, frc-4. (arz Egyptian Arabic falls back to ar Arabic; frc Cajun French falls back to fr French.)
I posit that they should get the following language fallback chain: arz, frc, ar, fr, en.
However, they currently get the following fallback chain: arz, frc, ar, en, fr.

Aside: on Variants
In the description and example above, I’ve left out language variants, which also factor into the chain. For languages that have variants (e.g. zh: zh, zh-hans, zh-hant, …), those variants are inserted into the fallback chain immediately after the main language, and before the other Babel languages in different language levels. For example, the babel zh-5, de-4 would produce a chain like zh, zh-hans, zh-hant, … de, en. In a way, language variants are prioritized above language fallbacks: Babel order trumps language fallback order, but variants usually trump the Babel order. (However, within the same level, the Babel order trumps even the variant order: a custom Babel like zh-hant-N, zh-hans-N can override the usual variant order of zh-hans, zh-hant.) This task doesn’t propose any changes to the handling of language variants.

Screenshots/mockups:

maintenance/shell.php
>>> array_map( function ( $lwc ) { return $lwc->getLanguageCode(); }, \Wikibase\Repo\WikibaseRepo::getLanguageFallbackChainFactory()->buildFromBabel( [ '5' => [ 'arz' ], '4' => [ 'frc' ] ] ) )
=> [
     "arz",
     "frc",
     "ar",
     "en",
     "fr",
   ]

Pseudocode:
Pseudocode of the current version:

  • for each Babel language level:
    • for each language in this level:
      • add language
    • for each language in this level:
      • add language’s variants (if any)
  • for each Babel language (sorted by level):
    • add language’s fallbacks and their variants

Pseudocode of the proposed change:

  • for each Babel language level:
    • for each language in this level:
      • add language
    • for each language in this level:
      • add language’s variants (if any)
  • for each Babel language (sorted by level):
    • add language’s explicit fallbacks and their variants
  • add implicit fallbacks (en, mul)

Acceptance criteria:

  • The Babel fallback chain lists the explicit language fallbacks of all Babel languages before any implicit fallbacks (en and, with T297393, mul) – unless the implicit fallbacks are also explicit fallbacks, of course (e.g. if the Babel includes sco Scots, which explicitly falls back to en).

Event Timeline

Change 756620 had a related patch set uploaded (by Lucas Werkmeister (WMDE); author: Lucas Werkmeister (WMDE)):

[mediawiki/extensions/Wikibase@master] Add explicit and implicit fallbacks to chain separately

https://gerrit.wikimedia.org/r/756620

@Lucas_Werkmeister_WMDE: Thank you for adding the pseudo-code! It really helps to understand what is actually going on. I have given this some more thought, in the hopes to make it more robust and intuitive.

I would suggest the following priorities:

  1. Babel language level
  2. Babel itself
  3. explicit fallbacks
  4. less specific locales
  5. more specific locales
  6. implicit fallback locales

Some pseudo-code to clarify (codes are only added if new):

- for each Babel language level:
 - for each locale on this language level:
   - add the locale itself
 - for each locale on this language level:
   - add the locale’s **explicit fallbacks** 
 - for all locales in the list so far:
   - add all less specific locales
 - for all locales in the list so far:
   - add all more specific locales
- add **implicit fallback locales** (mul > en)

I hope this will give us the best results based on the information that we have. I shared a document with you that illustrates the different outcomes. Also: This might still work if we included the locales that are requested by the browser at some point in the future.

@Lucas_Werkmeister_WMDE: Let's go with the safe option that you suggested to ensure that we can make mul happen in time! \o/

I have moved the more thorough changes of the fallback chain to a dedicated ticket: T300059

Change 756620 merged by jenkins-bot:

[mediawiki/extensions/Wikibase@master] Add explicit and implicit fallbacks to chain separately

https://gerrit.wikimedia.org/r/756620