As originally reported in r250686, we're not properly varying API responses depending on the language variants. The bug itself is quite complex, with several moving parts and edge cases, so the description below is going to be equally complex and long, and I apologize for that. In fact, this is a hybrid between a task description and an investigation write-up, adapted from my investigation notes.
Context
Skip at your leisure if you're already familiar with the relevant areas.
Variants
(Docs on mw.org)
MediaWiki supports language variants. For instance, suppose that you have a wiki whose content language is Serbian. You can write in Serbian using either the Latin or the Cyrillic alphabet, and you may have a personal preference for which one to use (for writing and/or reading). MediaWiki allows registered users to specify in their preference which variants they would like to use (Latin or Cyrillic). Some elements in the page (e.g. page title and page content) will get automatically converted to the variant you selected, so for instance you can write a page using the Cyrillic alphabet, and I will see it in the Latin version if I specify that variant as my preference.
Now, user preferences are not the only way to change your preferred variant. In fact, MediaWiki determines which variant to use for a given language according to the following flowchart. Note, the flowchart is a simplified version of the actual logic, which also includes validation and fallback for all input values, as well as $wgDefaultLanguageVariant as a site-wide default.
Note that it's possible to set variants for more than one language at the same time. For instance, if I'm a logged-out user and my Accept-Language header contains both sr-el (Serbian, Latin script) and zh-hant (Chinese Traditional), then any content in Serbian will be displayed to me using the Latin script, and any content in Chinese will use the Traditional characters.
To summarize, the following can be used to specify non-default variants:
- URL parameter
- User preference (logged-in only)
- Accept-Language header (logged-out only)
- GetLangPreferredVariant hook (very unpredictable, could do anything)
API response caching
(Docs on mw.org)
Caching for API responses is controlled by two factors: cache control and cache mode. The former reflects the max-age and s-max-age directives for the Cache-Control header, and clients can pass the maxage and smaxage parameters to set them. The cache mode is one of the following:
- public: Adds the public directive to the Cache-Control header, if a max age is specified
- private: Adds the private directive to the Cache-Control header, so other users won't get a cached response for the same request
- anon-public-user-private: Uses the public mode for anon requests, and private for logged-in requests (in practice, it adds Cookie to the Vary header)
Note that ApiMain has some internal logic to alter the cache mode, and it's not guaranteed to honour what the specific API module requested; for instance:
- On private wikis, only the private mode is allowed
- If the request has uselang=user (which is also the default) and the module sets the public mode, ApiMain downgrades that to anon-public-user-private
- If the mode is anon-public-user-private but the session is persistent (e.g. anonymous users who have edited), it forces the private mode
Languages used in the API output
This might seem trivial, but it's actually very important so I believe it deserves its own section. The output of an API module can (in theory) have content in multiple languages, and we currently have no way to determine which languages were used for the current response. The closest thing that we can do is to get a list of all languages that were instantiated for the current request.
Short problem statement
Under certain circumstances (see below), if you force a non-default language variant for an API request (via user preference or Accept-Language), the response will be cached and served to users whose variant is different (as determined by user preference and Accept-Language). In r250686, the following conditions are described as necessary for this bug to happen:
- The request doesn't result in an error
- The cache mode is "public"
- The user is logged-in
- The 'maxage' or 'smaxage' parameter is set in the request
- The 'uselang' parameter is 'content'
- The user has a variant preference other than the default
- The content is in a language matching the base language of the user's variant.
This list is actually not 100% accurate, as I'll explain below, but it does give an accurate idea of how hard it is for this bug to happen.
Reproduction steps
- In LocalSettings.php, set $wgLanguageCode = en; $wgUsePigLatinVariant = true;
- Make sure you're logged in
- On Special:Preferences, set variant to piglatin
- Go to OutputPage.php, in the getVaryHeader() method, and comment out the line which adds "Cookie" to the Vary header (reason explained in the next section)
- The API test URL is http://localhost/pedia/api.php?action=compare&format=json&fromtitle=TITLEHERE&fromrev=REV1HERE&totitle=TITLEHERE&torev=REV2HERE&uselang=content&maxage=120&smaxage=120
- Adjust as needed to match your local URL structure. The title and rev parameters can be anything, as long as the title and the revisions exist.
- You can run it conveniently with the following user script
function doTestReq() { var api = new mw.Api(); api.get( { action: 'compare', format: 'json', fromtitle: 'TITLEHERE', totitle: 'TITLEHERE', fromrev: REV1HERE, torev: REV2HERE, uselang: 'content', maxage: 120, smaxage: 120 } ) .done( function (e) { console.log(e) } ) .fail( function (e) { console.log(e) } ); } function addReqBtn() { if ( mw.config.get( 'wgNamespaceNumber' ) === -1 && mw.config.get( 'wgCanonicalSpecialPageName' ) === 'Blankpage' ) { var $btn = $( '<button/>', { text: 'Do API req', click: doTestReq } ); $( '#bodyContent' ).append( $btn ); } } $( addReqBtn );
- Make sure to have another account whose variant preference is the default (English)
- With your main account, go to Special:Blankpage, click on the button a few times with the devtools open; make sure that the request is sent, and any response after the first one is served from disk cache
- Quickly log out, then log back in with the other account. Note: incognito won't work
- Go to Special:Blankpage with the second account, send a single request and notice that it's served from cache. Not good!
Conditions that must be true for the bug to happen
I mentioned above that the conditions reported at r250686 are not entirely accurate. Here's why, as well as what's actually needed.
- The request doesn't result in an error
- The 'maxage' or 'smaxage' parameter is set in the request
- The 'uselang' parameter is 'content'
- The user has a variant preference other than the default
- The content is in a language matching the base language of the user's variant.
These are correct, and the reproduction steps I gave above ensure that they're all true.
- The cache mode is "public"
This is also correct, but remember that we also need a module whose output is variant-sensitive. There are a few API modules that use public caching (codesearch 1, codesearch 2); I used ApiComparePages because it's the first one I found and I know how to use it, but I haven't verified how many, if any, of those modules have variant-sensitive output.
- The user is logged-in
I'm not sure why this was included in the list of requirements, but AFAICS, the opposite is true, i.e. the user must be logged out. If the cache mode is public, ApiMain sets the Vary header by calling OutputPage::getVaryHeader(). The body of that method starts with
if ( $this->getCacheVaryCookies() ) { $this->addVaryHeader( 'Cookie' ); }
getCacheVaryCookies(), in turn, returns self::$cacheVaryCookies. The value of the prop can change depending on the site config and current session, but it always includes "forceHTTPS". The GetCacheVaryCookies hook can in theory empty the array, but this is not currently the case in Wikimedia production. All in all, this means that "Cookie" is always added to the Vary header, hence all logged-in requests are only cached privately and won't be served to anyone else (either logged-in or anon). Hence, in order to reproduce this bug, you need to make the request without being logged-in. The repro step 4. removes cookies from the Vary header, hence allowing you to test this bug while logged-in.
Obviously, the fact that getCacheVaryCookies() always returns a non-empty array, and thus that we always vary the cache on cookies, should not be relied upon. However, it reflects the status quo, so it should certainly be taken into account.
Possible solution
The what is easy: we need to properly vary the cache depending on which variants were used to build the API response. The how is all but trivial.
For starters, we obviously cannot downgrade caching to "private" for all requests :) Nor we can do that only for requests that do not pass the "variant" parameter if T117549 is fixed, since I believe that'd still be the vast majority of requests.
In fact, maybe we don't need to use "private" at all: r250686 implemented a fix using "anon-public-user-private". At first glance this might seem a good idea: for logged-in users, the cache would be private, so there should be no risk of pollution; for logged-out users, the Vary header would take care of Accept-Language. Or would it? Turns out that Accept-Language is not included in the Vary header for API requests! [1] For non-API requests, OutputPage takes care of it in addAcceptLanguage() (link). However, that method is only called from OutputPage::sendCacheControl(), which is not called by ApiMain [2]. This means that even if we do "anon-public-user-private" unconditionally, logged-out users would still be affected by the caching bug. Additionally, as mentioned above, the status quo is that only anon requests are currently affected by the caching bug, because we happen to always Vary the cache on Cookie; and since "anon-public-user-private" is identical to "public" for anon requests, it wouldn't make any difference.
So the first question would be: should we add "Accept-Language" to the Vary header for API requests, too? While in principle I don't think it would be wrong (we already do that for non-API requests), I fear that it might have unwanted side effects, like bloating the CDN cache or putting more pressure on the appservers (note [1] is related). So it could be an option, but probably not something to take lightly.
Now, if you've made it reading so far you're probably dazed and confused, so let me take you to a brave new world: imagine there's no Accept-Language that can change the variant, it's easy if you try. Would "anon-public-user-private" work now? Unfortunately, it wouldn't 100% work because the GetLangPreferredVariant could still change the variant in unpredictable ways. It's a wild world! In my opinion, removing the hook would be beneficial: making the preferred variant more predictable would mean easier caching. If you used codesearch while reading this, you might have seen something: the hook is actually unused! Right? Not right... According to T248651#6012660, which points to r367326, it's used in wikiHow code for varying the cache based on the value of a cookie. Which is exactly the kind of unpredictable changes that I was mentioning. In theory, we could add some logic to check whether any handler of that hook changed/set the variant and set the cache mode to private in that case, but I'd rather keep this as a last resort since it'd be pretty hacky. Especially because it's unclear which class should know about all those hook calls, and how that information should translate into varying the cache.
Alright, snap back to reality. It seems that "anon-public-user-private" wouldn't suffice, unless we
- Add Accept-Language to the Vary header for API requests, and
- Remove the GetLangPreferredVariant hook (or put a huge hack in place, so huge that not even the bloated CDN cache from note [1] or this task description could be that huge)
So yes, "anon-public-user-private" wouldn't work unless we change a lot of things. But at this point, let's just not care™ and pretend that we're correctly varying on Accept-Language and that the darn hook will be burned. So, next question: how do we determine when to downgrade caching from "public" to "anon-public-user-private"? Well, when a non-default variant was used in the output, o' course. Right. And how do we know that? Remember what I wrote in the third subsection of "context" above? We just can't know. But bear with me, I also mentioned a workaround: we can build a list of all languages that were instantiated for the current request. And then, what do we do? We check if any instantiated language has variants, and if so, downgrade caching unconditionally. In practice, when the content language of a wiki has variants, most requests on that wiki wouldn't be publicly cached (because the content language has high chances of being instantiated). This approach was attempted in r732908.
Would it work? Yes. Would it be perfect? Not even close. Would it cause performance issues? Maybe. Unfortunately, we can't do much better. Accept-Language is ignored for logged-in users and user preferences don't exist for anons, so we cannot reliably cache the response for someone in either group and serve it to someone in the other.
One final word
r250686, which is where this bug was first discovered, is a patch for adding a variant parameter to the API, that would work the same as the variant URL parameter. The patch was blocked by code review because we'd be officially adding a parameter whose behaviour is known to show a caching bug. While I was personally convinced by that explanation, after the above investigation the opposite is closer to the truth: only requests that do not pass the variant parameter would be fully affected. Passing the new parameter can either fix the bug, or make it affect one less language [3]. And BTW, note that both r732908 and r250686 do one thing wrong: we still need to run that code even if the "variant" parameter was passed. In that case, what we could do is check whether any language instantiated in the current request except for the one whose variant is specified in the URL has variants.
As such, I believe that the variant parameter can be added regardless of the caching bug, as long as there aren't other reasons not to add it; see, for instance:
At any rate, any discussion about the new parameter should happen in T117549.
[1] - If you're testing things locally, watch out: the UniversalLanguageSelector extension always adds Accept-Language to the Vary header if $wgULSLanguageDetection is true (and it is by default), ref. This setting is disabled in Wikimedia production because it
would vastly increase the size of the CDN cache, and increase MW appserver load.
[2] - I'm putting this in a note just because this description is already long enough. OutputPage was created for building the HTML; part of its functionality makes no sense in the context of API requests. Headers handling should probably happen in a separate class, and it should be more consistent.
[3] - If the language that you specify a variant for is the only one used in the response, then the bug is gone. If it's only one of the languages used in the response, the other ones will still be affected by this bug.
