Page MenuHomePhabricator

Static language table should include aliasing and type information
Open, LowPublic

Description

AppLanguageLookupTable is a wrapper around prebaked language data with special casing for certain language codes (Norwegian and Chinese (zh) at time of writing). This specialization is necessary because the lookup table must handle two inputs (with special cases):

  1. Android
    1. Standard codes
    2. Legacy codes
  2. Wiki
    1. Standard codes
    2. Legacy codes
    3. Dialect codes

An additional functionality of this class should also be considered, 3 we have to offer clients a set of all languages supported.

1, 2, and 3 all have great overlap. I claim that 1a and 2a are identical. 1a = 2a. For 3, we prune 1a to the supported subset.

I think the actual static data today is strictly a subset of 1a. This data *could* be 1a, or even 1a + 1b if we retained a little more information to weed out duplicate entries for 3. I think we just need an "alias" column for when you need to go from an Android legacy code to the standard code.

2 is a subset of 1a with 2b exceptions. 2c aren't exceptions, they're more like specifics and could be in 1a. Like the previous bullet, if we retained a little more aliasing information, I think we could keep 2b special cases as part of the static data table. The aliases could also be thought of as the reverse map for when you need to go from the wiki legacy to standard code.

It would be a little troublesome to me to blur the distinction of the static data between 1 and 2, so I would suggest one more column beyond "aliased", "type" which would be:

  • Standard code (no translation needed)
  • Android legacy alias (map from this legacy code to standard code)
  • Wiki legacy alias (map from this legacy code to standard code)

This "type" column would ingrain a sense distinction and hopefully eliminate confusion over what the data really was and how to use it.

Although it's closely related, I think WikiSite's handling of subdomains is correct and should remain distinct from this static data for now. As it's mentioned in WikiSite, the conversion of language code to URL is nonlossy but vice versa is lossy for dialects.

As an aside, I think the special None case can be removed OR the special no language dev setting could be removed. I don't think we need both but I could be mistaken.

Possibly related to T141053.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptNov 9 2016, 8:46 PM
Restricted Application added a subscriber: jhsoby. · View Herald TranscriptJul 28 2017, 4:10 AM
RHo added subscribers: Dbrant, RHo.Mar 29 2018, 6:21 PM

hey @Dbrant - is this something that (a) is still a thing, and (b) related to/affected by the multilingual epic T160567 ?

Restricted Application added a subscriber: jeblad. · View Herald TranscriptMar 29 2018, 6:21 PM