Page MenuHomePhabricator

Let API (and other entry points?) users to give MediaWiki a weighted list of languages and get the "best" response
Open, LowPublic

Description

For T197009: MCS should respect Accept-Language header for MW API requests we're writing some custom code into the service to take requests' HTTP Accept-Language headers and translate them into a single "best" value to pass to the uselang parameter.

However, this is not ideal, as whether or not MediaWiki can usefully respond to a given language is contextually dependent. A request for fr-CH;q=0.9,fr;q=0.6,pt;q=0.4 for a single-language-content wiki set to fr would work fine (as fr-CH falls back to fr), as it would for a single-language-content wiki on de (fr-CH eventually falls back to en, that's not fixable, so it'd come back in de.

On multi-content wikis, which right now means content variant wikis, asking for zh-Hant-MY would respond with zh-Hant when zh-Hant-SG might be a better match for the user's request language.

Something like uselangs would offload this code processing and picking of languages to MediaWiki, which is the right place for this decision.

Event Timeline

Jdforrester-WMF created this task.

Doesn't ULS already do this with the value of the Accept-Language header, if you set $wgULSAnonCanChangeLanguage to true on the wiki?

See also T193247: Create API module for language autodetection.

Doesn't ULS already do this with the value of the Accept-Language header, if you set $wgULSAnonCanChangeLanguage to true on the wiki?

If wgULSAnonCanChangeLanguage and wgULSLanguageDetection are both true, yes. However, I'm proposing that this should be in core (and enabled).

See also T193247: Create API module for language autodetection.

Yes, what @Tgr asked for, but as an APIBase parameter available on every API call, not something each client needs to request, cache, and re-request each time. That would fail to represent the right response on multi-language wikis.

If wgULSAnonCanChangeLanguage and wgULSLanguageDetection are both true, yes. However, I'm proposing that this should be in core (and enabled).

Should we retitle this bug to something like "RequestContext::getLanguage() in core should use Accept-Language when determining the language for anons", with it being used whenever "uselang" isn't provided, instead of requesting a "uselangs" parameter specific to the API?

If you want it for logged-in users too despite them having a user preference specifying the language, you might also propose "uselang=detect" or something like that to override the default core behavior of using the user preference.

If wgULSAnonCanChangeLanguage and wgULSLanguageDetection are both true, yes. However, I'm proposing that this should be in core (and enabled).

Should we retitle this bug to something like "RequestContext::getLanguage() in core should use Accept-Language when determining the language for anons", with it being used whenever "uselang" isn't provided, instead of requesting a "uselangs" parameter specific to the API?

That would be fine for my purposes, but seems somewhat orthogonal to the general design philosophy of the API where HTTP request headers generally don't trigger different behaviour. Your call.

If you want it for logged-in users too despite them having a user preference specifying the language, you might also propose "uselang=detect" or something like that to override the default core behavior of using the user preference.

That could work. Though we should use qxx-detect or something so that it doesn't take over a potential real future value?

That would be fine for my purposes, but seems somewhat orthogonal to the general design philosophy of the API where HTTP request headers generally don't trigger different behaviour. Your call.

I think this issue is at an intersection of different concerns. While the API doesn't generally use HTTP request headers as behavior switches, it also wants uselang=user to match the behavior the web UI uses to determine the interface language when uselang isn't given. And here we're talking about putting Accept-Language parsing in core for that purpose.

Although the only example of a "the API doesn't use HTTP headers" policy I can think of is that the API has a format parameter instead of checking the Accept header.

  • The origin parameter isn't an example, as that exists to avoid having to serve Vary: Origin in every response per T22814#248552 rather than being part of a policy of avoiding HTTP headers.
  • The maxage and smaxage parameters aren't examples, as far as I can tell, replacements for the Cache-Control header. Cache-Control tells a proxy about how stale of a request it's willing to let the proxy send assuming the origin resource is already being cached. The maxage and smaxage parameters, on the other hand, serve to tell the API that usually-uncacheable responses should be allowed to be cached.
  • Most other headers, such as Accept-Encoding, are handled at MediaWiki or webserver layers instead. Possible handling of Accept-Language would fall into this category.

There are also some counterexamples:

  • The API does have handling for If-None-Match and If-Modified-Since, although this depends on the action module to supply etag and last-modified data which as far as I know none do.
  • A custom Treat-As-Untrusted header forces the response to be treated as not having same-origin security.
  • A custom Promise-Non-Write-API-Action header makes it raise an error if used with a write action. Although this is intended for multi-DC routing rather than client use.
  • And there's API-User-Agent, although that only affects the agent logged for ApiFeatureUsage.

If you want it for logged-in users too despite them having a user preference specifying the language, you might also propose "uselang=detect" or something like that to override the default core behavior of using the user preference.

That could work. Though we should use qxx-detect or something so that it doesn't take over a potential real future value?

Hmm, that might be a good idea. On the other hand, the IETF language tag syntax that we mostly use for MediaWiki language codes doesn't seem to allow "detect" as a valid value anyway, and "qxx" isn't actually in the range reserved for local use (even though MediaWiki uses it).

I note that RequestContext and the API both currently make use of "user" to indicate the default processing, and the API uses "content" to specifically request the content language. If we do go with qxx-detect or something like that, both of those should probably be deprecated in favor of corresponding prefixed versions.

Vvjjkkii renamed this task from Let API (and other entry points?) users to give MediaWiki a weighted list of languages and get the "best" response to 0zaaaaaaaa.Jul 1 2018, 1:03 AM
Vvjjkkii raised the priority of this task from Low to High.
Vvjjkkii updated the task description. (Show Details)
Vvjjkkii removed a subscriber: Aklapper.
CommunityTechBot renamed this task from 0zaaaaaaaa to Let API (and other entry points?) users to give MediaWiki a weighted list of languages and get the "best" response.Jul 2 2018, 12:11 PM
CommunityTechBot lowered the priority of this task from High to Low.
CommunityTechBot updated the task description. (Show Details)
CommunityTechBot added a subscriber: Aklapper.