From Audiences Services sync: iOS should send an Accept-Language header and verify it is sending the correct one (mediawiki version of it).
What is iOS currently doing, does anything need to change to do the above?
From Audiences Services sync: iOS should send an Accept-Language header and verify it is sending the correct one (mediawiki version of it).
What is iOS currently doing, does anything need to change to do the above?
We're following what's defined here, based on the ordered language preference list defined in the user's system preferences:
https://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html
Here's an example:
en, it-us;q=0.67, zh-hans;q=0.33
Hmm, yeah, that's not going to work well for us right now. :-( MediaWiki has its own set of codes that sort-of follow BCP 47 but don't in quite a few cases related to languages with variants (Crimean Tatar, Gan, Inuktitut, Kazakh, Kurdish, Tachelhit, Serbian, Tajik, Uzbek, and Chinese).
The specific Accept-Language string you give will work, but there's lots of cases where similar ones won't.
For example, codes for Chinese are:
Variant | MW code | BCP code | Result if you ask for BCP 47 | Notes |
Mixed variants(!) | zh | N/A | Not possible | We don't want to ship this anyway |
Simplified Chinese | zh-hans | zh-Hans | As requested | |
Traditional Chinese | zh-hant | zh-Hant | As requested | |
Mainland Simplified Chinese | zh-cn | zh-Hans-CN | Mixed zh variants | Problematic |
Malaysia Simplified Chinese | zh-my | zh-Hans-MY | Mixed zh variants | Problematic |
Singapore Simplified Chinese | zh-sg | zh-Hans-SG | Mixed zh variants | Problematic |
Hong Kong Simplified Chinese | N/A | zh-Hans-HK | Mixed zh variants | Problematic |
Macao Simplified Chinese | N/A | zh-Hans-MO | Mixed zh variants | Problematic |
Taiwan Simplified Chinese | N/A | zh-Hans-TW | Mixed zh variants | Problematic |
Mainland Traditional Chinese | N/A | zh-Hans-CN | Mixed zh variants | Problematic |
Malaysia Traditional Chinese | N/A | zh-Hans-MY | Mixed zh variants | Problematic |
Singapore Traditional Chinese | N/A | zh-Hans-SG | Mixed zh variants | Problematic |
Hong Kong Traditional Chinese | zh-hk | zh-Hans-HK | Mixed zh variants | Problematic |
Macao Traditional Chinese | zk-mo | zh-Hans-MO | Mixed zh variants | Problematic |
Taiwan Traditional Chinese | zh-tw | zh-Hans-TW | Mixed zh variants | Problematic |
Mandarin Chinese | N/A | zh-cmn | Mixed zh variants | Deprecated code, no need to worry |
Mandarin Chinese (Simplified) | N/A | zh-cmn-Hans | Mixed zh variants | Deprecated code, no need to worry |
Mandarin Chinese (Traditional) | N/A | zh-cmn-Hant | Mixed zh variants | Deprecated code, no need to worry |
There's also Gan, Wu, Classical and Cantonese, all of which have standards we don't follow fully. :-(
For Chinese we could just say "only ask for the script variant, not the country", but (a) I don't know if it's possible on iOS for you to go from user requested locale to variant, and (b) that rule definitely doesn't work in Serbian (where our codes are sr-el and sr-ec not sr-Latn and sr-Cyrl, at least for now).
@Jdforrester-WMF looking more closely, we do special case the Chinese variants. Instead of zh-Hant-HK, we send zh-hk. The same is true for -cn, -tw, -sg, and -mo
It looks like we would need to add special cases for Crimean Tatar, Gan, Inuktitut, Kazakh, Kurdish, Tachelhit, Serbian, Tajik, and Uzbek.
Aha, awesome. (Adding -my which got enabled a few weeks ago might make sense?)
It looks like we would need to add special cases for Crimean Tatar, Gan, Inuktitut, Kazakh, Kurdish, Tachelhit, Serbian, Tajik, and Uzbek.
Yeah. :-( I don't know to what extent iOS supports those locales; https://www.ibabbleon.com/iOS-Language-Codes-ISO-639.html looks unpromising.
Looks like we might fix this partially in core: https://gerrit.wikimedia.org/r/443687 and https://gerrit.wikimedia.org/r/442200
^ It appears the apps are already doing some mappings; I'm saying they shouldn't need to do that. Let's keep the craziness confined to core and try not to let it infect services and apps...
@Jdforrester-WMF @Dbrant I'm putting together the mapping dictionary for overriding Accept-Language codes. So far I have a lookup that maps BCP 47 language, script, and region to MediaWiki codes. If the language code is in this dictionary, it's assumed that if the script or region codes aren't present or aren't found in the mapping, the apps fall back to the "default" value. Open to suggestions for format. Also, for languages that don't fall into these overrides, should we be sending all the information we have (sometimes leads to combinations like "ru-us" if the user has Russian language selected but US region) or just the language code?
Here's what I have so far:
{ "zh": { "default": { "default": "zh-hans", "cn": "zh-cn", "hk": "zh-hk", "mo": "zh-mo", "my": "zh-my", "sg": "zh-sg", "tw": "zh-tw" }, "hans": { "default": "zh-hans", "cn": "zh-cn", "hk": "zh-hk", "mo": "zh-mo", "my": "zh-my", "sg": "zh-sg", "tw": "zh-tw" }, "hant": { "default": "zh-hant", "cn": "zh-cn", "hk": "zh-hk", "mo": "zh-mo", "my": "zh-my", "sg": "zh-sg", "tw": "zh-tw" } }, "sr": { "default": { "default": "sr-ec" }, "cyrl": { "default": "sr-ec" }, "latn": { "default": "sr-el" } } }
That looks great. You don't need to worry about ru-us etc. – when MW doesn't recognise a country variant, it'll fall back to the main language code which will work as expected (in that case, ru-us requests will get you an ru response).
[Edit: Having said that, this table will mean that people asking for Hans (Simplified) in Taiwan will instead get Hant (Tradtional) and v.v. for the mainland, which is a product decision, but it will "work".]