Page MenuHomePhabricator

Make Varnish cache for /static/$wmfbranch/ expire when resources change within branch lifetime
Closed, ResolvedPublic

Description

TL;DR Change from bits to text varnish cache for static assets may be causing resources to be cached by Varnish for longer durations than previously assumed and we need to find a solution.

@AndyRussG noticed on 2015-05-14 that the behavior of ?debug=true for URLs related to CentralNotice javascript assets was differing from his past experience for newly updated resources. See #wikimedia-operations irc logs starting at 15:27:46Z.

Eventually @BBlack and @bd808 started discussing the possible implications of the recent changes that moved serving static assets (js, css, images, font) from the bits.wikimedia.org vhost which had its own Varnish cluster to the same vhost as the visited wiki (e.g. en.wikipedia.org) which is fronted by the "text" Varnish cluster.

The text Varnish cluster has more cache space than the bits Varnish cluster did so it is quite possible (and even likely) that assets which tended to fall out of the bits cache are held by the text cache for a greater time. This behavior should be mitigated by the use of versioned URLs (e.g. https://en.wikipedia.org/static/1.26wmf5/extensions/ImageMetrics/resources/head.js). This version however does not change for a non-branch deploy (i.e. SWAT or other non-release train update) like the one that @AndyRussG was attempting to validate.

The "stickier" cache behavior of the text Varnish cache is actually desirable, but we know that there will be assets that need to be updated more frequently than the current deploy train cadence supports. We need to find a solution that busts the fronting Varnish cache for these resources.

One possible solution would be to provide a tool that can issue PURGE multicast requests to the Varnish cluster to invalidate specific resource URLs. MediaWiki actually does this for articles and uploaded assets at the time those resources change on the backend in response to user activity.

Deployment strategy:

  1. [mediawiki/core] Change MediaWiki to use hashes in urls. – https://gerrit.wikimedia.org/r/265868
  2. [operations/puppet] Update Varnish VCL for /static (hostname neutral) to also cover these /w subdirectories (iff a hash query string is present). – https://gerrit.wikimedia.org/r/269149
  3. [operations/mediawiki-config] Implement and deploy new /w/static.php entry point. – https://gerrit.wikimedia.org/r/263566
  4. [operations/puppet] Change Apache configuration to rewrite /w/{skins,resources,extensions}/.* to /w/static.php on all wiki domains. – https://gerrit.wikimedia.org/r/268802, https://gerrit.wikimedia.org/r/271013, https://gerrit.wikimedia.org/r/271330
  5. [operations/mediawiki-config] Set $wgResourceBasePath to '/w' in wmf-config.

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

I discussed the proposal in T99096#1583708 with @tstarling and @mark at the Dev Summit. The only issue we is with serving requests for older versions of files when browsers view older (cached) versions of page html.

E.g.:

  • en.wikipedia.org at 1.25wmf1.
  • Page X is parsed and put in Varnish with reference to foo.png (en.wikipedia.org/static/foo.png).
  • en.wikipedia.org is updated to 1.25wmf2 in which foo.png has changed significantly or even deleted.
  • Page X remains in cache for another 30 days. Whilst the wmf1 branch is still on disk, static.php would not find the foo.png file using described logic at T99096#1583708 as it would be looking at enwiki's current mw version (wmf2).

However we found an interesting solution that is more feasible than we initially anticipated. The proxy script can simply scan the various known multi-version directories for past versions of the file (if not found in the current directory). Given that an explicit file path is given, as well as a sha1 hash, it'd be limited to only a handful of file stat calls in the worst case scenario. In addition, resources are likely to stay in Varnish cache for those 30 days a well so it's not a hot code path.

With this last amendment in place, the proposal was positively received. I'll bring it up in the next Architecture Committee meeting for additional confirmation (and subsequent RFC if necessary).

The alternative idea (of collecting files in a centralised directory by hash filename) was also considered viable, but somewhat less desirable due to needing to create and maintain an additional component in our deployment system.

The alternative idea (of collecting files in a centralised directory by hash filename) was also considered viable, but somewhat less desirable due to needing to create and maintain an additional component in our deployment system.

I'm not opposed to building and maintaining the needed deployment system functionality, as Scap is under active development right now. Also, Release-Engineering-Team want to move forward with T89945: Merge to deployed branches instead of cutting a new deployment branch every week. which has implications for how this would work. Specifically, there would be only 2 (or maybe 3) branches deployed at any given time. There wouldn't be a collection of historical branches sitting around on web servers where we could search for historical versions of static resources.

awight removed a subscriber: awight.Jan 12 2016, 12:04 AM
Krinkle claimed this task.Jan 12 2016, 12:14 AM
Krinkle added a project: Performance-Team.
Krinkle moved this task from Inbox to Doing on the Performance-Team board.

Change 263566 had a related patch set uploaded (by Krinkle):
[WIP] Implement /w/static.php

https://gerrit.wikimedia.org/r/263566

Change 265868 had a related patch set uploaded (by Krinkle):
Centralise url handling for urls to static resources

https://gerrit.wikimedia.org/r/265868

Krinkle renamed this task from Varnish cache busting desired for /static/$VERSION/ resources which change within the lifetime of a branch to Varnish cache for /static/$wmfbranch/ doesn't expire when resources change within branch lifetime.Feb 2 2016, 2:01 AM
Krinkle renamed this task from Varnish cache for /static/$wmfbranch/ doesn't expire when resources change within branch lifetime to Make Varnish cache for /static/$wmfbranch/ expire when resources change within branch lifetime.

Change 265868 merged by jenkins-bot:
Centralise url handling for urls to static resources

https://gerrit.wikimedia.org/r/265868

Krinkle added a comment.EditedFeb 4 2016, 9:12 PM

(Added deployment strategy to the task description)

I'm currently thinking about an alternate way for point #3 because path /w/static exists already (as symlink to /static, containing $wmfbranch subdirectories). I don't think /w/static is used anywhere (aside from wgLocalStylePath for serving Vector IE6 csshover.htc) – which we can easily update by changing its value from /w/static/$wmfbranch/skins to /w/static/skins. But if anything else is using /w/static/$wmfbranch/, that will become 404 after this change as it'll be a virtual directory that is one less level deep now.

Alternatively, we can make it even cleaner by rewriting /w/skins, /w/resources and /w/extensions to /w/static.php directly instead of adding using the url prefix of /w/static (as currently proposed). That would match what we do elsewhere with multiversion endpoints – we emulate the original path of the same name, no different name. In that case we would Apache rewrite /w/{skins,resources,extensions}/.* to /w/static.php. And we can keep $wgResourceBasePath set to /w (per MediaWiki default).

That requires we do not use /w/{skins,resources,extensions}/.* for non-static files. And after this change, that will be naturally enforced as e.g. PHP files will no longer execute (static.php will refuse to serve those).

Change 268715 had a related patch set uploaded (by Krinkle):
[DONT MERGE] Set $wgResourceBasePath to "/w"

https://gerrit.wikimedia.org/r/268715

Krinkle updated the task description. (Show Details)Feb 5 2016, 7:25 PM
Krinkle updated the task description. (Show Details)Feb 5 2016, 7:31 PM

Change 268802 had a related patch set uploaded (by Krinkle):
[DONT MERGE] mediawiki: Rewrite /w/{skins,resources,extensions} to /w/static.php

https://gerrit.wikimedia.org/r/268802

Krinkle updated the task description. (Show Details)Feb 5 2016, 11:34 PM

Change 269149 had a related patch set uploaded (by Krinkle):
cache: Normalise hostname for /w/skins,resources,extensions

https://gerrit.wikimedia.org/r/269149

Krinkle updated the task description. (Show Details)Feb 8 2016, 3:57 PM

Change 269149 merged by BBlack:
cache: Normalise hostname for /w/skins,resources,extensions

https://gerrit.wikimedia.org/r/269149

Krinkle updated the task description. (Show Details)Feb 9 2016, 10:08 PM

Change 263566 merged by jenkins-bot:
Implement /w/static.php

https://gerrit.wikimedia.org/r/263566

Krinkle updated the task description. (Show Details)Feb 10 2016, 10:52 PM

Change 268802 merged by Giuseppe Lavagetto:
mediawiki: Rewrite /w/{skins,resources,extensions} to /w/static.php

https://gerrit.wikimedia.org/r/268802

Krinkle updated the task description. (Show Details)Feb 12 2016, 11:09 PM

Change 270446 had a related patch set uploaded (by Krinkle):
Set $wgResourceBasePath to "/w" for beta cluster wikis

https://gerrit.wikimedia.org/r/270446

Change 270446 merged by jenkins-bot:
Set $wgResourceBasePath to "/w" for beta cluster wikis

https://gerrit.wikimedia.org/r/270446

$wgLocalStylePath was using /w/static/{wmfbranch} instead of /static/{wmfbranch} (which meant it wasn't sharing the same optimised Varnish cache for /static). This probably originates from when /static didn't exist yet and it was served from bits, so we exposed a symlink under /w/static with the same content.

Now that /static is on the main domain, we don't need a second one at /w/static, too.

Keeping the symlink for back-compat (until HTML cache turns over), but this should reduce complexity and improve cache performance.

Change 268715 merged by jenkins-bot:
Set $wgResourceBasePath to "/w" for group0 wikis

https://gerrit.wikimedia.org/r/268715

Change 271002 had a related patch set uploaded (by Krinkle):
MimeMagic: Recognise .htc as text/x-component

https://gerrit.wikimedia.org/r/271002

Change 271003 had a related patch set uploaded (by Krinkle):
MimeMagic: Recognise .htc as text/x-component

https://gerrit.wikimedia.org/r/271003

Change 271003 merged by jenkins-bot:
MimeMagic: Recognise .htc as text/x-component

https://gerrit.wikimedia.org/r/271003

Krinkle added a comment.EditedFeb 16 2016, 6:01 PM

Rollout to group1 and group2 is blocked on Apache config being applied more widely first. Apparently various wiki domains are lacking the public-wiki-rewrites.incl configuration.

main.conf (puppet):

  • *.wikipedia.org: has public-wiki-rewrites.incl
  • *.wikinews.org: has public-wiki-rewrites.incl
  • test.wikidata.org: has public-wiki-rewrites.incl
  • *.wikidata.org: has public-wiki-rewrites.incl
  • *.wiktionary.org: has public-wiki-rewrites.incl
  • donate.wikimedia.org, donate.wikipedia.org: has public-wiki-rewrites.incl
  • vote.wikimedia.org: has public-wiki-rewrites.incl
  • www.mediawiki.org: missing
  • *.wikiquote.org: missing
  • *.wikibooks.org: missing
  • *.wikisource.org: missing
  • *.wikiversity.org: missing
  • *.wikivoyage.org: missing

remnant.conf (puppet):

  • meta.wikimedia.org: has public-wiki-rewrites.incl (via wikimedia-common)
  • wikisource.org: has public-wiki-rewrites.incl
  • commons.wikimedia.org: has public-wiki-rewrites.incl
  • grants.wikimedia.org and other fishbowls: has public-wiki-rewrites.incl (via wikimedia-common)
  • usability.wikimedia.org: missing

Which means https://www.mediawiki.org/w/resources/assets/poweredby_mediawiki_88x31.png is currently 404 Not Found because https://gerrit.wikimedia.org/r/263566 was ineffective for that domain.

Mentioned in SAL [2016-02-16T18:03:56Z] <krinkle@tin> Synchronized php-1.27.0-wmf.13/includes/mime.types: Fix .htc static (T99096) (duration: 00m 58s)

Change 271009 had a related patch set uploaded (by Krinkle):
MimeMagic: Recognise .htc as text/x-component

https://gerrit.wikimedia.org/r/271009

Change 271009 merged by jenkins-bot:
MimeMagic: Recognise .htc as text/x-component

https://gerrit.wikimedia.org/r/271009

Change 271013 had a related patch set uploaded (by Krinkle):
mediawiki: Apply public-wiki-rewrites.incl to www.mediawiki.org

https://gerrit.wikimedia.org/r/271013

Change 271013 merged by Ori.livneh:
mediawiki: Apply public-wiki-rewrites.incl to www.mediawiki.org

https://gerrit.wikimedia.org/r/271013

Krinkle updated the task description. (Show Details)Feb 17 2016, 7:34 PM

Change 271330 had a related patch set uploaded (by Krinkle):
mediawiki: Apply public-wiki-rewrites.incl to *.wikiquote.org

https://gerrit.wikimedia.org/r/271330

Change 271332 had a related patch set uploaded (by Krinkle):
mediawiki: Apply public-wiki-rewrites to wikiversity and wikivoyage

https://gerrit.wikimedia.org/r/271332

Change 271337 had a related patch set uploaded (by Krinkle):
Re-apply "Set $wgResourceBasePath to /w for www.mediawiki.org"

https://gerrit.wikimedia.org/r/271337

Change 271332 abandoned by Krinkle:
mediawiki: Apply public-wiki-rewrites to wikiversity and wikivoyage

Reason:
Squared into Iadcc9c890f484.

https://gerrit.wikimedia.org/r/271332

Change 271337 merged by jenkins-bot:
Re-apply "Set $wgResourceBasePath to /w for www.mediawiki.org"

https://gerrit.wikimedia.org/r/271337

Change 271330 merged by Ori.livneh:
mediawiki: Apply public-wiki-rewrites to all remaining wiki domains

https://gerrit.wikimedia.org/r/271330

Mentioned in SAL [2016-02-17T20:25:38Z] <krinkle@tin> Synchronized wmf-config/CommonSettings.php: Re-enable T99096 for mediawiki.org (duration: 01m 29s)

Krinkle updated the task description. (Show Details)Feb 17 2016, 8:30 PM

Change 271002 merged by jenkins-bot:
MimeMagic: Recognise .htc as text/x-component

https://gerrit.wikimedia.org/r/271002

Change 271708 had a related patch set uploaded (by Krinkle):
Convert $wgResourceBasePath switch to InitialiseSettings

https://gerrit.wikimedia.org/r/271708

Change 271709 had a related patch set uploaded (by Krinkle):
Set $wgResourceBasePath to "/w" for small wikis

https://gerrit.wikimedia.org/r/271709

Change 271710 had a related patch set uploaded (by Krinkle):
Set $wgResourceBasePath to "/w" for medium wikis

https://gerrit.wikimedia.org/r/271710

Change 271711 had a related patch set uploaded (by Krinkle):
Set $wgResourceBasePath to "/w" for all wikis

https://gerrit.wikimedia.org/r/271711

Change 271708 merged by jenkins-bot:
Convert $wgResourceBasePath switch to InitialiseSettings

https://gerrit.wikimedia.org/r/271708

Rollout to group1 and group2 is blocked on Apache config being applied more widely first. Apparently various wiki domains are lacking the public-wiki-rewrites.incl configuration.
main.conf (puppet):

  • *.wikipedia.org: has public-wiki-rewrites.incl
  • *.wikinews.org: has public-wiki-rewrites.incl
  • *.wikidata.org: has public-wiki-rewrites.incl
  • *.wiktionary.org: has public-wiki-rewrites.incl
  • www.mediawiki.org: missing
  • *.wikiquote.org: missing
  • *.wikibooks.org: missing
  • *.wikisource.org: missing
  • *.wikiversity.org: missing
  • *.wikivoyage.org: missing

remnant.conf (puppet):

  • usability.wikimedia.org: missing

Which means https://www.mediawiki.org/w/resources/assets/poweredby_mediawiki_88x31.png is currently 404 Not Found because https://gerrit.wikimedia.org/r/263566 was ineffective for that domain.

This has been fixed for all domains now with Apache config patch https://gerrit.wikimedia.org/r271013 and https://gerrit.wikimedia.org/r/271330. We can continue roll out of $wgResourceBasePath = "/w" to more wikis now.

Change 271709 merged by jenkins-bot:
Set $wgResourceBasePath to "/w" for small wikis

https://gerrit.wikimedia.org/r/271709

Change 271710 merged by jenkins-bot:
Set $wgResourceBasePath to "/w" for medium wikis

https://gerrit.wikimedia.org/r/271710

Change 273410 had a related patch set uploaded (by Alex Monk):
Add public-wiki-rewrites to wikitech

https://gerrit.wikimedia.org/r/273410

Change 271711 merged by jenkins-bot:
Set $wgResourceBasePath to "/w" for remaining wikis

https://gerrit.wikimedia.org/r/271711

Krinkle closed this task as Resolved.Feb 26 2016, 1:58 PM
Krinkle removed a project: Patch-For-Review.
Krinkle updated the task description. (Show Details)

Mentioned in SAL [2016-02-26T13:59:17Z] <krinkle@tin> Synchronized wmf-config/InitialiseSettings.php: T99096: Enable wmgUseWmfstatic on remaining wikis (duration: 00m 50s)

Change 273410 merged by Jcrespo:
Add public-wiki-rewrites to wikitech

https://gerrit.wikimedia.org/r/273410

Change 276383 had a related patch set uploaded (by Krinkle):
multiversion: Remove logic for branch pointers in /w/static

https://gerrit.wikimedia.org/r/276383

Change 276383 merged by jenkins-bot:
multiversion: Remove logic for branch pointers in /w/static

https://gerrit.wikimedia.org/r/276383

Change 276748 had a related patch set uploaded (by Krinkle):
Remove unused static symlinks for beta php-master

https://gerrit.wikimedia.org/r/276748

Change 276748 merged by jenkins-bot:
Remove unused static symlinks for beta php-master

https://gerrit.wikimedia.org/r/276748

Change 281379 had a related patch set uploaded (by Krinkle):
Remove inaccessible symlinks at /w/extensions and /w/skins

https://gerrit.wikimedia.org/r/281379

Change 281379 merged by jenkins-bot:
Remove inaccessible symlinks at /w/extensions and /w/skins

https://gerrit.wikimedia.org/r/281379