Page MenuHomePhabricator

Versioned load.php url for long caching gets short-cached instead
Closed, ResolvedPublic


Identified via, I was surprised to find a versioned load.php url among the list of urls with no expiry or a short expiry.

Given our current platform, I expect to only find thumbnails (no expiry), skin stylesheet (5 minutes) and the startup module (5 minutes) in that list, but instead also found:

  •…er.defaults&skin=minerva&version=15glgu9 (5 minutes)

For that test, the full url was:

Which currently responds with:

cache-control: public, max-age=300, s-maxage=300
etag: W/"1e9lqn3"

age: 292
backend-timing: D=95385 t=1519316833509087
content-encoding: gzip
content-length: 99653
content-type: text/javascript; charset=utf-8
date: Fri, 23 Feb 2018 08:27:10 GMT
expires: Fri, 23 Feb 2018 08:27:18 GMT
server: mw1242.eqiad.wmnet

This is expected behaviour when a url is constructed with a version parameter that doesn't match the current version by the server. The server only has the current versions of static assets and will respond with those as such, but we don't want to cache them under the "wrong" url for too long.

This was implemented to address T117587 (6fa1e56), which is the problem where during a deployment, a user can get module meta data from server A (with new code) and then construct its versioned url client-side and fetch (this time to server B; with old code still), at which point we don't want to popular the Varnish edge cache with the "old" content under the "new" url version.

However, I don't understand why the client is requesting the wrong version in this case. It doesn't appear to have changed recently.

Additionally, I did verify to see if this may be a bug with any of the individual modules (e.g. oscillating version, or unmatchable/non-deterministic version hash), but that wasn't the case. If I request the same url with the etag version in the query parameter, then I get the correct response headers.

cache-control: public, max-age=2592000, s-maxage=2592000

Event Timeline

Krinkle triaged this task as High priority.Feb 23 2018, 8:37 AM

Whatever it is, it not is a generic issue with all mobile load.php requests. It only happens for 1 module load.php request with lots of modules in the batch.

For example:

  • The request for jquery + mediawiki (url), has a matching E-Tag and correct long-term expiry.
  • request for mobile.toc.images, randomly saw it when logged-in (url), has matching E-Tag and long expiry.
  • request for ext.gadget.switcher (url) is also fine.
  • request for modules=user (with user query), is also fine.
  • request for mobile.toc%7Cmobile.toc.images%7Cskins.minerva.tablet.scripts is also fine.
  • request for mobile.loggingSchemas.mobileWebMainMenuClickTracking%7Cschema.MobileWebMainMenuClickTracking is also fine.

The only one failing for my random page view is for:

Full module lines:


When I hacked together some inline scripts to make the client-side JavaScript on call mw.loader#getCombinedVersion() with the above array of module names, it actually ends up producing the correct version string. The one that the server expects, instead of the one we see the client sending.

This means somehow the array of module names, or the order of module names is different between the point where we normally call mw.loader#getCombinedVersion() on the client, and the point where we call doRequest() on the client with the compressed module query parameter.

I can actually reproduce this bug locally in my mediawiki-docker-dev environment with regular desktop requests, so it's probably just bad luck that whatever bug we're facing happens to only happen on mobile. In my local environment, just like on mobile in prod, most versioned requests are fine, but the "main" one with lots of modules has a shortened expiry and a version query that doesn't match the E-Tag.

Here's a comparison between what the client-side has as module list, while executing mw.loader#getCombinedVersion, and what the server-side has as module list while executing ResourceLoader::getCombinedVersion.

  [console.log][] module list for request:  (24)
  > ["ext.eventLogging.subscriber", "ext.navigationTiming", "jquery.accessKeyLabel", "jquery.checkboxShiftClick", "jquery.client", "jquery.cookie", "jquery.getAttrs", "jquery.highlightText", "", "jquery.suggestions", "jquery.tabIndex", "jquery.throttle-debounce", "mediawiki.RegExp", "mediawiki.api",
- > "mediawiki.api.user", "mediawiki.notify", "", "", "mediawiki.searchSuggest", "", "mediawiki.user",
  > "mediawiki.util", "skins.vector.js", "user.defaults"]

  [PHP] ResourceLoader::getCombinedVersion: (24)
  > ["ext.eventLogging.subscriber", "ext.navigationTiming", "jquery.accessKeyLabel", "jquery.checkboxShiftClick", "jquery.client", "jquery.cookie", "jquery.getAttrs", "jquery.highlightText", "", "jquery.suggestions", "jquery.tabIndex", "jquery.throttle-debounce", "mediawiki.RegExp", "mediawiki.api",
-  > "mediawiki.notify", "mediawiki.searchSuggest", "", "mediawiki.user", "mediawiki.util", "mediawiki.api.user", "", "",
  > "skins.vector.js", "user.defaults"]

The lines marked with a red minus sign, are where the difference is. I suspect this might be the result of either sort() not behaving the same way between JS and PHP (quite probably in general, but unlikely to be relevant here given our simple ASCII strings), or somehow our expandModuleNames()/makePackedModulesString() implementations change the order of modules, which they're not supposed to.

I did find out that ResourceLoader::makePackedModulesString() in PHP does not preserve the original ordering. When it iterates through, it will append to any group, not just the current group. However, while that might be unexpected, it does lead to better compression and (I think) it shouldn't be a problem because:

  • We only sort before compression on the client, to improve consistency of urls and likelihood of Varnish caching. There is no expectation that sorts between browsers or between a browser and PHP is the same. We don't re-sort on the server (I think?), we only expand the input string and use it as-is. So difference in sorting shouldn't matter.
  • Assuming the not-in-order behaviour is only happening in compression, and not in decompression, why would we end up with a different array on the server after decompression?

The array marked [PHP] in the comment above is out of order. mediawiki.api.* and* appear too late in the array (they should appear right after their parents, as they do in the [console.log] one, but they instead appear after mediawiki.util). Either this is using a very strange sort function, or maybe the PHP code doesn't re-sort after dependency resolution is done, or maybe the decompression function on the server doesn't preserve order? I'm leaning towards the latter because the only module names affected are the ones that would use comma syntax in the compressed format.

The array marked [PHP] [..] or maybe the decompression function on the server doesn't preserve order? I'm leaning towards the latter because the only module names affected are the ones that would use comma syntax in the compressed format.

(Roan and I figured it out in-person, documenting here.)

The decompression function does preserve order and is fine.

We do sort the array client-side before compression, and (for what its worth) I did verify the sorting for these strings in particular would happen the same way in PHP as in JS. Although that doesn't matter in this case because there is no sorting involved when the server-side decompresses the module list and verifies the version hash. The server does not, and should not, apply any sorting here.

The contract between the client and server is as follows:

  • The client can request one or more modules in the same request. It ends this request with a version query parameter, which is a hash of all the requested modules' hashes, as combined in the same order as the list of modules in the request.
  • The server optionally decompresses the list, then combines its locally known hashes in the same order, hashes the result and verifies whether they match.

In order to increase the likelihood of a cache-hit in the browser and in Varnish, we do sort the list of modules client-side, but otherwise this is insignificant. None of the relevant methods do any sorting themselves, and are supposed to work with any order given.

In my previous comment, I pointed out that the only method involved here that effectively changes order is makePackedModulesString (and its JavaScript equivalent, inlined in mw.loader#batchRequest). The methods basically work as follows (simplified for previty, not complete):

pseudo code (like php/js)
function makePackedModulesString( string[] modules ) {
  groups = [];
  foreach ( modules as module ) {
    prefix = modules[ i ].substr( 0, lastDotIndex );
    suffix = modules[ i ].slice( lastDotIndex + 1 );
    groups[ prefix ][] =  suffix;
  pieces = [];
  foreach ( groups as prefix => suffixes ) {
    prieces[] = prefix . '.' . suffixes.join( ',' );
  return pieces.join( '|' );

For the most part, this preserved order. Except that when it uses the current prefix to consider a group to extend, it considers all groups, not just the group seen from the last iteration. Which means a list like a.a, a.b, z.a, a.c will end up packed as a.a,b,c|z.a. Now, granted, that edge case should be eliminated by the coincidence that we sorted the list ahead of time.

However even when sorted, this can still go wrong because sorting applies to the complete module name. So when the (sorted) list is:


This will be compressed as mw.a,b | mw.a.a,b, which expands to:


So the client is violating the earlier specified contract because its version hash is based on modules in an order sorted by full name, whereas the actual module list effectively ends up in an order where the groups are sorted, with their "children" sorted within. Thus leaving the overall list not in the original order.

As mentioned before, it doesn't matter whether either the modules or version parameter uses a sorted list. The only thing that matters is that they use the same order. Which right now they don't because the order of operations is:

  1. get module list, and sort it (as cache optimisation).
  2. build combined version has by combining hashes in order of the module list.
  3. build module string in compressed form, which ends up puts later modules like mw.b ahead of mw.a.a.

Unfortunately, I have to conclude this can't be a recent regression. The order discrepancy has been in ResourceLoader since at least the initial release in 2011. In terms of the order being used for anything, this was introduced about 2-3 years (2015) when we migrated ResourceLoader from using version timestamps to version hashing (f37cee996ee9). But even then it didn't matter as the contract with the server at that point was still only that the version query should differ from any other url with the same module query parameter when it needs to bypass the cache. Its value wasn't used server-side until about a 1-2 years ago (2016) when we added validation of the version parameter in order to fix T117587.

We just haven't noticed before as it just so happens to as often before now, I guess?

I've actually been able to reproduce this consistently on desktop as well. So the impact is potentially quite big.

Fortunately, once we fix this, we should see a significant improvement at Grafana dashboard: ResourceLoader, specifically I'd expect a big shift in "200 OK" responses from short expiry to long expiry, which would be nice :)

Change 415494 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[mediawiki/core@master] resourceloader: Clean up and better document module list (un)packing

Change 415497 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[mediawiki/core@master] resourceloader: Make load.mock.php standalone (no includes)

Change 415513 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[mediawiki/core@master] [WIP] resourceloader: Failing test for T188076

Change 415516 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[mediawiki/core@master] [WIP] resourceloader: Fix T188076

Change 415650 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[mediawiki/core@master] resourceloader: Fix mw.loader to compute version for current request only

Change 415516 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[mediawiki/core@master] [WIP] resourceloader: Fix client to combine versions in packed order

Change 415513 abandoned by Krinkle:
resourceloader: Add failing tests for T188076

Split and squashed into I00d6985c and I211523d47816.

ori added a subscriber: ori.

This is really impressive, Krinkle. Kudos.

Change 415494 merged by jenkins-bot:
[mediawiki/core@master] resourceloader: Clean up and better document module list (un)packing

Change 415497 merged by jenkins-bot:
[mediawiki/core@master] resourceloader: Simplify load.mock.php by removing includes dependency

Change 415650 merged by jenkins-bot:
[mediawiki/core@master] resourceloader: Fix mw.loader to compute version for current request only

Change 415516 merged by jenkins-bot:
[mediawiki/core@master] resourceloader: Fix mw.loader to compute combined version in packed order