Page MenuHomePhabricator

[Task] UI should get list of supported languages from the backend, via dedicated resource loader module
Closed, DeclinedPublic


We should have a single source for available languages. The backend should determine them (see T78006), and provide them to the frontend via a dedicated ResourceLoader module, similar to the SitesModule.

It should be possible to both add and remove available language codes via configuration (see T86182). Sources to be considered for available language codes:

  • MediaWiki's list of UI languages
  • $wgExtraLanguageNames (for additional codes)
  • $wgDummyLanguageCodes (for alias codes)
  • UniversalLanguageSelector / CLDR (if available)
  • Babel (if it has support for extra codes)

Event Timeline

daniel raised the priority of this task from to Medium.
daniel updated the task description. (Show Details)
daniel changed Security from none to None.

Note that there need to be at least two sets of languages:
one for languages that is supported for the UI (Names.php, pretty much),
and one for input (labels, description, monolingual text, etc).

The user interface languages may already be available as a resource (at least if ULS is present), so we could reduce the size of the data to load by only including the additional languages in the custom module.

Terms currently are silently limited to UI languages, since you only get the input controls for UI languages. Internally, the views wrongly work with $, though.
With, monolingual text uses UI languages consistently (i. e., frontend and backend).

Actually, the views work with wgULSLanguages (UI languages), but fall back on $ if the requested language is not in wgULSLanguages. We could probably just remove this fallback. @thiemowmde

I'm wondering how to best implement this. One goal would be to not duplicate ext.uls.languagenames (which currently has 3.8kb gzipped / 7.3kb uncompressed for en). I'd also like to have something quite flexible: It should support at least ULS, ULS + static language list, ULS - static language list, static language list as values. I think this issue mirrors the question of how to configure content languages. I suggest to take a similar approach for both problems, if not exactly the same.

What I'm currently thinking about is basically giving a class hierarchy as config and serialization:

$monolingualTextValueLanguages = FilteringContentLanguages::spec(
    ListContentLanguages::spec( array( 'zxx', 'und' ) )
  ListContentLanguages::spec( array( 'en', 'fr', 'de' ) )
); /* => array(
  'type' => 'Filtering',
  '_left' => array(
    'type' => 'Merging',
    '_left' => array( 'type' => 'Uls' ),
    '_right' => array( 'type' => 'List', '_list' => array( 'zxx', 'und' ) )
   '_right' => array( 'type' => 'List', '_list' => array( 'en', 'fr', 'de' ) )
) */

mw.config.set( 'wbMonolingualTextValueLanguages', {
  type: 'Filtering',
  _left: {
    type: 'Merging',
    _left: { type: 'Uls' },
    _right: { type: 'List', _list: [ 'zxx', 'und' ] }
  _right: { type: 'List', _list: [ 'en', 'fr', 'de' ] }
} );

@JeroenDeDauw @daniel @thiemowmde What do you think?

Needs feedback and confirmation. If confirmed implementation needs to be done.

Jonas renamed this task from UI should get list of supported languages from the backend, via dedicated resource loader module to [Task] UI should get list of supported languages from the backend, via dedicated resource loader module.Sep 10 2015, 7:36 PM
  • @adrianheine, the example is confusing me a bit. Why are you filtering en, fr and de from a list that was just composed from ULS plus a few extra languages?
  • One problem I see with the serialization format is that it doesn't give the individual parts names. For example, the merged sub-list in the example can not be addressed and reused.
  • Why not do something like:
mw.config.set( 'wbMonolingualTextValueLanguages', [
    { type: 'Uls' },
    { type: 'Merging', list: [ 'zxx', 'und' ] },
    { type: 'Filtering', list: [ 'en', 'fr', 'de' ] }
] );

To me it looks like this could do exactly the same while being way easier to read, parse and understand.

  • The example allows all languages known to ULS plus zxx and und, but minus en, fr and `de. It's obviously artificial :)
  • I don't see why you would need to reuse parts in the serialization. Can you give me a use-case for that?
  • Merging and filtering are operations with two operands. The tree structure makes this structure very explicit. Your proposal seems to be a postfix notation, which makes it difficult to follow which operand belongs to which operation. Also you're breaking polymorphism by only allowing plain lists as second operand. A true postfix notation would in my opinion look like this:
mw.config.set( 'wbMonolingualTextValueLanguages', [
  { type: 'Uls' },
  { type: 'List', data: [ 'zxx', 'und' ] },
  { type: 'Merging' },
  { type: 'List', data: [ 'en', 'fr', 'de' ] },
  { type: 'Filtering' }
] );

I would probably switch to the mathematical operation names: ›Union‹ instead of ›Merging‹, ›Difference‹ instead of ›Filtering‹.

I agree that it would be nice to have a flexible system like this. Whether it is given in the form of nested constructors, or nested object literals, I don't care much. But it should be possible to combine the bits freely. In Thiemo's version, I'm not sure how I would add the ULS languages to a fixed list (instead of vice versa).

In addition to fixed list, ULS, union, and difference, I think may also want a "core" languages list. Or would "ULS" just use the core languages, if ULS is not there? We need a sane fallback for the case that ULS is not installed.

For the structural representation, I would probably go for something very simple:

[ { op: "add", source: "uls" }, { op: "add", data: ['a', 'b'] }, { op: "remove", data: ['x', 'y'] } ]

However, when adding explicit languages, just specifying the language code is not enough. For each of those languages, we would need to provide at least a localized name, and maybe some additional info, like rtl/ltr. So we'd need something like:

{ op: "add", data: { a: { name: "Aaarg" }, b: { name: "Böörk", dir: "rtl" } } }

Can be even simpler with the same flexibility:

  • Adrians original example: [ { add: "uls" }, { add: ["zxx", "und"] }, { remove: ["en", "fr", "de"] } ]
  • Daniels example: [ { add: "uls" }, { add: ["a", "b"] }, { remove: ["x", "y"] } ]
  • [ { add: { a: { name: "Aaarg" }, b: { name: "Böörk", dir: "rtl" } } } ]

This is almost identical to Daniels suggestion, the only difference is that I ditched the keys "op", "source" and "data". I can see that this is a bit more readable, but technically it's not necessary. Strings describe a source, arrays and objects are data. The op is a key instead of a string.

I'm strongly against putting information in keys, and I'm against data and source keys. They break extensibility and composability.

Seems like we all broadly agree on what information and structure is needed. I think we can leave the color of the bike shed to whoever builds it, now :)

  • "add" and "remove" is not more or less information than "type" or "data".
  • Not sure what's wrong with Daniels suggestion. I'm fine with it, including "source" and "data", which by the way was also in the original proposal.
  • "add" and "remove" is not more or less information than "type" or "data".
  • Not sure what's wrong with Daniels suggestion. I'm fine with it, including "source" and "data", which by the way was also in the original proposal.

type should be the only key that's known to the factory/deserializer/whatever. data is internal to ListContentLanguages. The class responsible for performing a union does not know about sources and data literals, it just knows about two (or more) ContentLanguages. source was not in my original proposal.

Yellow! it should be yellow!

We decided in T124758 to not pass the complete list(s) of language codes to the UI, but instead implement a language suggestion API endpoint.