Page MenuHomePhabricator

send an Accept-Language header and verify it is sending the correct one (iOS)
Closed, ResolvedPublic

Description

From Audiences Services sync: iOS should send an Accept-Language header and verify it is sending the correct one (mediawiki version of it).

What is iOS currently doing, does anything need to change to do the above?

Event Timeline

We're following what's defined here, based on the ordered language preference list defined in the user's system preferences:

https://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html

Here's an example:

en, it-us;q=0.67, zh-hans;q=0.33

Hmm, yeah, that's not going to work well for us right now. :-( MediaWiki has its own set of codes that sort-of follow BCP 47 but don't in quite a few cases related to languages with variants (Crimean Tatar, Gan, Inuktitut, Kazakh, Kurdish, Tachelhit, Serbian, Tajik, Uzbek, and Chinese).

The specific Accept-Language string you give will work, but there's lots of cases where similar ones won't.

For example, codes for Chinese are:

VariantMW codeBCP codeResult if you ask for BCP 47Notes
Mixed variants(!)zhN/ANot possibleWe don't want to ship this anyway
Simplified Chinesezh-hanszh-Hans As requested
Traditional Chinesezh-hantzh-Hant As requested
Mainland Simplified Chinesezh-cnzh-Hans-CN Mixed zh variantsProblematic
Malaysia Simplified Chinesezh-myzh-Hans-MY Mixed zh variantsProblematic
Singapore Simplified Chinesezh-sgzh-Hans-SG Mixed zh variantsProblematic
Hong Kong Simplified ChineseN/Azh-Hans-HK Mixed zh variantsProblematic
Macao Simplified ChineseN/Azh-Hans-MO Mixed zh variantsProblematic
Taiwan Simplified ChineseN/Azh-Hans-TW Mixed zh variantsProblematic
Mainland Traditional ChineseN/Azh-Hans-CN Mixed zh variantsProblematic
Malaysia Traditional ChineseN/Azh-Hans-MY Mixed zh variantsProblematic
Singapore Traditional ChineseN/Azh-Hans-SG Mixed zh variantsProblematic
Hong Kong Traditional Chinesezh-hkzh-Hans-HK Mixed zh variantsProblematic
Macao Traditional Chinesezk-mozh-Hans-MO Mixed zh variantsProblematic
Taiwan Traditional Chinesezh-twzh-Hans-TW Mixed zh variantsProblematic
Mandarin ChineseN/Azh-cmn Mixed zh variantsDeprecated code, no need to worry
Mandarin Chinese (Simplified)N/Azh-cmn-Hans Mixed zh variantsDeprecated code, no need to worry
Mandarin Chinese (Traditional)N/Azh-cmn-Hant Mixed zh variantsDeprecated code, no need to worry

There's also Gan, Wu, Classical and Cantonese, all of which have standards we don't follow fully. :-(

For Chinese we could just say "only ask for the script variant, not the country", but (a) I don't know if it's possible on iOS for you to go from user requested locale to variant, and (b) that rule definitely doesn't work in Serbian (where our codes are sr-el and sr-ec not sr-Latn and sr-Cyrl, at least for now).

LGoto renamed this task from send an Accept-Language header and verify it is sending the correct one to send an Accept-Language header and verify it is sending the correct one (iOS).Jul 11 2018, 7:15 PM

@Jdforrester-WMF looking more closely, we do special case the Chinese variants. Instead of zh-Hant-HK, we send zh-hk. The same is true for -cn, -tw, -sg, and -mo

It looks like we would need to add special cases for Crimean Tatar, Gan, Inuktitut, Kazakh, Kurdish, Tachelhit, Serbian, Tajik, and Uzbek.

@Jdforrester-WMF looking more closely, we do special case the Chinese variants. Instead of zh-Hant-HK, we send zh-hk. The same is true for -cn, -tw, -sg, and -mo

Aha, awesome. (Adding -my which got enabled a few weeks ago might make sense?)

It looks like we would need to add special cases for Crimean Tatar, Gan, Inuktitut, Kazakh, Kurdish, Tachelhit, Serbian, Tajik, and Uzbek.

Yeah. :-( I don't know to what extent iOS supports those locales; https://www.ibabbleon.com/iOS-Language-Codes-ISO-639.html looks unpromising.

Looks like we might fix this partially in core: https://gerrit.wikimedia.org/r/443687 and https://gerrit.wikimedia.org/r/442200

Yes, but the iOS and Android teams don't need to care, that's my job. :-)

@Jdforrester-WMF looking more closely, we do special case the Chinese variants. Instead of zh-Hant-HK, we send zh-hk. The same is true for -cn, -tw, -sg, and -mo

^ It appears the apps are already doing some mappings; I'm saying they shouldn't need to do that. Let's keep the craziness confined to core and try not to let it infect services and apps...

@Jdforrester-WMF @Dbrant I'm putting together the mapping dictionary for overriding Accept-Language codes. So far I have a lookup that maps BCP 47 language, script, and region to MediaWiki codes. If the language code is in this dictionary, it's assumed that if the script or region codes aren't present or aren't found in the mapping, the apps fall back to the "default" value. Open to suggestions for format. Also, for languages that don't fall into these overrides, should we be sending all the information we have (sometimes leads to combinations like "ru-us" if the user has Russian language selected but US region) or just the language code?

Here's what I have so far:

{
    "zh": {
        "default": {
            "default": "zh-hans",
            "cn": "zh-cn",
            "hk": "zh-hk",
            "mo": "zh-mo",
            "my": "zh-my",
            "sg": "zh-sg",
            "tw": "zh-tw"
        },
        "hans": {
            "default": "zh-hans",
            "cn": "zh-cn",
            "hk": "zh-hk",
            "mo": "zh-mo",
            "my": "zh-my",
            "sg": "zh-sg",
            "tw": "zh-tw"
        },
        "hant": {
            "default": "zh-hant",
            "cn": "zh-cn",
            "hk": "zh-hk",
            "mo": "zh-mo",
            "my": "zh-my",
            "sg": "zh-sg",
            "tw": "zh-tw"
        }
    },
    "sr": {
        "default": {
            "default": "sr-ec"
        },
        "cyrl": {
            "default": "sr-ec"
        },
        "latn": {
            "default": "sr-el"
        }
    }
}

@Jdforrester-WMF @Dbrant I'm putting together the mapping dictionary for overriding Accept-Language codes. So far I have a lookup that maps BCP 47 language, script, and region to MediaWiki codes. If the language code is in this dictionary, it's assumed that if the script or region codes aren't present or aren't found in the mapping, the apps fall back to the "default" value. Open to suggestions for format. Also, for languages that don't fall into these overrides, should we be sending all the information we have (sometimes leads to combinations like "ru-us" if the user has Russian language selected but US region) or just the language code?

Here's what I have so far:

{
    "zh": {
        "default": {
            "default": "zh-hans",
            "cn": "zh-cn",
            "hk": "zh-hk",
            "mo": "zh-mo",
            "my": "zh-my",
            "sg": "zh-sg",
            "tw": "zh-tw"
        },
        "hans": {
            "default": "zh-hans",
            "cn": "zh-cn",
            "hk": "zh-hk",
            "mo": "zh-mo",
            "my": "zh-my",
            "sg": "zh-sg",
            "tw": "zh-tw"
        },
        "hant": {
            "default": "zh-hant",
            "cn": "zh-cn",
            "hk": "zh-hk",
            "mo": "zh-mo",
            "my": "zh-my",
            "sg": "zh-sg",
            "tw": "zh-tw"
        }
    },
    "sr": {
        "default": {
            "default": "sr-ec"
        },
        "cyrl": {
            "default": "sr-ec"
        },
        "latn": {
            "default": "sr-el"
        }
    }
}

That looks great. You don't need to worry about ru-us etc. – when MW doesn't recognise a country variant, it'll fall back to the main language code which will work as expected (in that case, ru-us requests will get you an ru response).

[Edit: Having said that, this table will mean that people asking for Hans (Simplified) in Taiwan will instead get Hant (Tradtional) and v.v. for the mainland, which is a product decision, but it will "work".]