This follows-up from T285232: The restricted/mediawiki-webserver image should include skins and resources and more specifically the rough plan I outlined at T285232#7377397.
Status quo
For additional background and details, refer to https://wikitech.wikimedia.org/wiki/MediaWiki_at_WMF#Static_files.
Metrics at https://grafana.wikimedia.org/d/000000212/mediawiki-static.
In a nut shell:
We rewrite static file URLs under /w/ to /w/static.php at WMF. This can currently respond in broadly two ways: 1) It serves up the correct file based on the current wiki hostname and permits caching upto 24h, or 2) If it has a verifiable version hash in the query string we try to serve that version (in a way that is resilient to race conditions during deploys, per T47877) which is agnostic of request hostname and allows immutable long-term caching.
And then we have a third category, which is /static/current which is basically the same as serving from /w/ without a version hash, except that the caller is giving us permission to cache it long-term the same as if it had a version hash. This is important for cases where we can't or don't want to propagate changes immediately and instead let the browser use whichever version it has consistently across different pages. For example for footer icons and logos referenced in ParserCache or CDN-cache HTML, rather than have some pages with the new file and some with the old, they all reference the same stable URL and it rolls over whenever it rolls over for a given browser and CDN region.
Oppertunity
- For WMF, simpler webserver and docker image configuration without /static/current.
- For WMF, simpler Varnish configuration without needing to vary on the presence of a hash-like query string for traffic via /w/* => /w/static.php. It can all be host-agnostic, instead of only some of it.
- For MediaWiki, support formatting this third category of URLs without hardcoding WMF-specifics. Right now we can only use /static/current in wmf-config, thus geatly limiting its use. Which means people either decide to use the first category instead (shortening the cache needlessly) or have to make things configurable that shouldn't have to be configurable and even then are suboptimal by default out of the box and so basically only WMF bothers changing it.
Proposal
Instead of considering unversioned assset URLs under /w as cacheable for only 24h (which we bumped up from 1h not so long ago), bump this all the way to 1 year. Thus making it identical to what /static/current is today. This means we can phase out use of that WMF-specific concept.
For MediaWiki we can then consider that to make a URL stable without version and long-cached, simply reference it directly from /w/. That seems like a fairly natural thing to do and is what some codepaths may already do today, except they'd no longer have the consequence of a shortened cache at WMF.
This category of unversioned /w URLs is also sometimes used by gadgets and site scripts where figuring out the file hash is difficult or impossible, and thus these also get unfortunately short caching currently. This would be improved as a result.
So why didn't we do this 5 years ago when we created this after T99096? Well, I think it's just that at the time we didn't have the confidence that we could transition all file references where we want change propagation to be formatted with a version hash based on access to the file system.
We've since accomplished that for virtually everything (including for background images referenced in CSS and LESS files, as automatically done by ResourceLoader) now uses a version hash if it needs change propagation. And the handful of cases where we can't do that, we either don't want to anyway (where we use /static/current today) or we don't mind caching longer (where gadgets use /w/ and get a shortened TTL today).
Implementation
Two approaches come to mind.
Plan 1: Top down:
No backend or VCL changes until after we're done.
CDN cache will effectively be reduced from 1y to 24h during the transition.
- Switch remaining use of /static/current to /w. Accept the temporary reduction in cache TTL for new clients (24h is still pretty good).
- Remove or disable health checks for /static/current.
- Remove Apache route for /static/current (remove symlink in operations/mediawiki-config.git, remove special rewrite rule in operations/deployment-charts.git).
- Remove remnants such as the now-unused code in /w/static.php, and any disabled health checks.
- Change static.php to make /w/ host-agnostic and have the same 1-year cache as today for /static/current.
- Change Varnish VCL to simplify /w/static.php routing to be fully host-agnostic, regardless of query string.
Plan 2: Bottom up:
This will make more changes upfront, which makes them more visible and exposed to caching, but will also let us know quickly for any edge case or mistakes and increases confidence as such. One downside is that it will temporarily increase complexity in /w/static.php code.
- Change static.php to make /w/ host-agnostic and have the same 1-year cache as today for /static/current.
- Change Varnish VCL to simplify /w/static.php routing to be fully host-agnostic, regardless of query string.
- Switch remaining use of /static/current to /w.
- Remove or disable health checks for /static/current.
- Remove prod route for /static/current (remove symlink in operations/mediawiki-config.git, remove special rewrite rule in operations/deployment-charts.git).
- Remove k8s route for /static/current (remove rewrite rule in operations/deployment-charts.git).
- Remove remnants such as the now-unused code in /w/static.php, and any disabled health checks.