TL;DR Change from bits to text varnish cache for static assets may be causing resources to be cached by Varnish for longer durations than previously assumed and we need to find a solution.
@AndyRussG noticed on 2015-05-14 that the behavior of ?debug=true for URLs related to CentralNotice javascript assets was differing from his past experience for newly updated resources. See #wikimedia-operations irc logs starting at 15:27:46Z.
Eventually @BBlack and @bd808 started discussing the possible implications of the recent changes that moved serving static assets (js, css, images, font) from the bits.wikimedia.org vhost which had its own Varnish cluster to the same vhost as the visited wiki (e.g. en.wikipedia.org) which is fronted by the "text" Varnish cluster.
The text Varnish cluster has more cache space than the bits Varnish cluster did so it is quite possible (and even likely) that assets which tended to fall out of the bits cache are held by the text cache for a greater time. This behavior should be mitigated by the use of versioned URLs (e.g. https://en.wikipedia.org/static/1.26wmf5/extensions/ImageMetrics/resources/head.js). This version however does not change for a non-branch deploy (i.e. SWAT or other non-release train update) like the one that @AndyRussG was attempting to validate.
The "stickier" cache behavior of the text Varnish cache is actually desirable, but we know that there will be assets that need to be updated more frequently than the current deploy train cadence supports. We need to find a solution that busts the fronting Varnish cache for these resources.
One possible solution would be to provide a tool that can issue PURGE multicast requests to the Varnish cluster to invalidate specific resource URLs. MediaWiki actually does this for articles and uploaded assets at the time those resources change on the backend in response to user activity.
Deployment strategy:
- [mediawiki/core] Change MediaWiki to use hashes in urls. – https://gerrit.wikimedia.org/r/265868
- [operations/puppet] Update Varnish VCL for /static (hostname neutral) to also cover these /w subdirectories (iff a hash query string is present). – https://gerrit.wikimedia.org/r/269149
- [operations/mediawiki-config] Implement and deploy new /w/static.php entry point. – https://gerrit.wikimedia.org/r/263566
- [operations/puppet] Change Apache configuration to rewrite /w/{skins,resources,extensions}/.* to /w/static.php on all wiki domains. – https://gerrit.wikimedia.org/r/268802, https://gerrit.wikimedia.org/r/271013, https://gerrit.wikimedia.org/r/271330
- [operations/mediawiki-config] Set $wgResourceBasePath to '/w' in wmf-config.
- For beta cluster – https://gerrit.wikimedia.org/r/270446
- For testwiki and test2wiki – https://gerrit.wikimedia.org/r/268715
- For mediawikiwiki – https://gerrit.wikimedia.org/r/268715 (reverted per T99096#2032268), https://gerrit.wikimedia.org/r/271337
- For non-wikipedia
- For wikipedia