Page MenuHomePhabricator

Ensure content languages are sorted by language code
Open, Needs TriagePublic

Description

[This task is out of scope for the "Improved Language Fallback (MUL)" initiative (see "Out of Scope" section in the epic T312097).]

Steps to reproduce:

What happens?
mul can be found at the very end of the list.

What should have happened instead?
mul should be found at the position defined by sorting by language code.

Notes:

Notes about the status quo:
The list is currently a union of MediaWikiContentLanguages and additional StaticContentLanguages

The list of term language codes is:

  • all codes supported by MediaWiki core

MediaWiki ensures this list is sorted
only in production, this also includes agq, bag etc., via wmgExtraLanguageNames
*a hard-coded extra set of language codes: agq, bag, etc.
**this list is sorted in the source code
*mul, if enabled

So on production Wikidata, the second and third list item make no difference; on a default Wikibase, the third list item has no effect, but agq/bag/etc. will be after the regular language codes; on Test Wikidata, the second list item makes no difference (redundant), but the third list item adds an unsorted language code at the end.

Open questions:

  • What Sorting do we want?
    • Sarai wanted to sort the list alphabetically by the language name, originally.
    • The dev team asks is sorting by language code would be good enough? It would have the advantage that it is only one order for all UI languages.
    • Manuel sees some upsides for keeping the status quo: MUL is also sorted last in the termbox, so maybe this position is intuitive in the list as well.

Acceptance criteria:

  • The languages are sorted alphabetically / by language code / as is (TBD).

Event Timeline

Manuel renamed this task from Make sorting of languages in lists more consistent. to Make sorting of languages in language code lists more consistent. .Mar 21 2023, 1:50 PM
Manuel renamed this task from Make sorting of languages in language code lists more consistent. to Ensure content languages are sorted by language code.Mar 22 2023, 10:16 AM
Manuel updated the task description. (Show Details)

Change 901555 had a related patch set uploaded (by Lucas Werkmeister (WMDE); author: Lucas Werkmeister (WMDE)):

[mediawiki/extensions/Wikibase@master] WIP: Ensure content languages are sorted by language code

https://gerrit.wikimedia.org/r/901555

Change 901555 had a related patch set uploaded (by Lucas Werkmeister (WMDE); author: Lucas Werkmeister (WMDE)):

[mediawiki/extensions/Wikibase@master] WIP: Ensure content languages are sorted by language code

https://gerrit.wikimedia.org/r/901555

WIP patch that I put together during a meeting where we discussed this issue (since the code was already starting to form inside my head and I wanted to get it out); needs more work though.

Sarai wanted to sort the list alphabetically by the language name, originally.
The dev team asks is sorting by language code would be good enough? It would have the advantage that it is only one order for all UI languages.

Sorting by language code would be better than by language name, because it provides a stable and relatively predictable ordering. Sorting by name is not as straightforward as it sounds.

Many languages are known by multiple names (is gsw Alemannic or Swiss German? is be Belarussisch or Weißrussisch? is ny Chichewa, Chewa, Nyanja or Chinyanja? etc). If they're sorted by name, changing a name, including adding a missing translation, will suddenly cause it to appear somewhere else in the list, instead of where people have got used to finding it.

The names are not all in the same language. There are a lot of missing translations, because CLDR does not try to translate all language names, and the extra names in the CLDR extension are not translatable in the usual way (translations can only be added by creating/editing LocalNamesXx.php, something which most people are unaware of or can't do). That means the list almost always has a lot of English names in it (e.g. the language dropdown on https://www.wikidata.org/wiki/Special:NewItem?uselang=ru) or sometimes it will be a mixture of English and a fallback language (e.g. the language dropdown on https://www.wikidata.org/wiki/Special:NewItem?uselang=szy). You would need to already know which language the name is in to know where it will be in the list.

We don't have a separate name for sorting, which means language variants would get split up, where they currently get grouped together. For example, "American English" (en-us) won't be anywhere near "English" (en), "Schweizer Hochdeutsch" (de-ch) won't be anywhere near "Deutsch" (de), "Chinese" (zh) won't be anywhere near "Traditional Chinese" (zh-hant), etc. CLDR has a "menu" variant for a handful of names (like "Sami, Northern" for se) but it doesn't even include the examples I just gave.

Not having a separate name for sorting also means we can't sort some languages properly at all. Japanese is the main example I'm aware of, where things are sorted by pronunciation, but we don't have the pronunciation of the language names to be able to do that. For example, English is "英語" ("Eigo") which should sort fairly close to the beginning (e.g. it's the first entry under the 4th Japanese character in this category in the Japanese Wiktionary), but sorting by the name alone would put it near the bottom because kanji names would come after kana names.

Manuel sees some upsides for keeping the status quo: MUL is also sorted last in the termbox, so maybe this position is intuitive in the list as well.

Both make sense in their own ways. Sorting strictly by language code might be more intuitive if you look at it as a list of options sorted by language code. Putting mul at the end might be more intuitive if you look at it as a list of specific languages to choose from, because mul does not represent a specific language - putting it at the end would be similar to how people normally put "other" at the end of a list instead of between "n" and "p".