Page MenuHomePhabricator

Deprecate "/static/current" at WMF in favour of similar long-cache unversioned /w/ URLs
Closed, ResolvedPublic

Description

This follows-up from T285232: The restricted/mediawiki-webserver image should include skins and resources and more specifically the rough plan I outlined at T285232#7377397.

Status quo

For additional background and details, refer to https://wikitech.wikimedia.org/wiki/MediaWiki_at_WMF#Static_files.

Metrics at https://grafana.wikimedia.org/d/000000212/mediawiki-static.

In a nut shell:

We rewrite static file URLs under /w/ to /w/static.php at WMF. This can currently respond in broadly two ways: 1) It serves up the correct file based on the current wiki hostname and permits caching upto 24h, or 2) If it has a verifiable version hash in the query string we try to serve that version (in a way that is resilient to race conditions during deploys, per T47877) which is agnostic of request hostname and allows immutable long-term caching.

And then we have a third category, which is /static/current which is basically the same as serving from /w/ without a version hash, except that the caller is giving us permission to cache it long-term the same as if it had a version hash. This is important for cases where we can't or don't want to propagate changes immediately and instead let the browser use whichever version it has consistently across different pages. For example for footer icons and logos referenced in ParserCache or CDN-cache HTML, rather than have some pages with the new file and some with the old, they all reference the same stable URL and it rolls over whenever it rolls over for a given browser and CDN region.

Oppertunity

  • For WMF, simpler webserver and docker image configuration without /static/current.
  • For WMF, simpler Varnish configuration without needing to vary on the presence of a hash-like query string for traffic via /w/* => /w/static.php. It can all be host-agnostic, instead of only some of it.
  • For MediaWiki, support formatting this third category of URLs without hardcoding WMF-specifics. Right now we can only use /static/current in wmf-config, thus geatly limiting its use. Which means people either decide to use the first category instead (shortening the cache needlessly) or have to make things configurable that shouldn't have to be configurable and even then are suboptimal by default out of the box and so basically only WMF bothers changing it.

Proposal

Instead of considering unversioned assset URLs under /w as cacheable for only 24h (which we bumped up from 1h not so long ago), bump this all the way to 1 year. Thus making it identical to what /static/current is today. This means we can phase out use of that WMF-specific concept.

For MediaWiki we can then consider that to make a URL stable without version and long-cached, simply reference it directly from /w/. That seems like a fairly natural thing to do and is what some codepaths may already do today, except they'd no longer have the consequence of a shortened cache at WMF.

This category of unversioned /w URLs is also sometimes used by gadgets and site scripts where figuring out the file hash is difficult or impossible, and thus these also get unfortunately short caching currently. This would be improved as a result.

So why didn't we do this 5 years ago when we created this after T99096? Well, I think it's just that at the time we didn't have the confidence that we could transition all file references where we want change propagation to be formatted with a version hash based on access to the file system.

We've since accomplished that for virtually everything (including for background images referenced in CSS and LESS files, as automatically done by ResourceLoader) now uses a version hash if it needs change propagation. And the handful of cases where we can't do that, we either don't want to anyway (where we use /static/current today) or we don't mind caching longer (where gadgets use /w/ and get a shortened TTL today).

Implementation

Two approaches come to mind.

Plan 1: Top down:

No backend or VCL changes until after we're done.
CDN cache will effectively be reduced from 1y to 24h during the transition.

  • Switch remaining use of /static/current to /w. Accept the temporary reduction in cache TTL for new clients (24h is still pretty good).
  • Remove or disable health checks for /static/current.
  • Remove Apache route for /static/current (remove symlink in operations/mediawiki-config.git, remove special rewrite rule in operations/deployment-charts.git).
  • Remove remnants such as the now-unused code in /w/static.php, and any disabled health checks.
  • Change static.php to make /w/ host-agnostic and have the same 1-year cache as today for /static/current.
  • Change Varnish VCL to simplify /w/static.php routing to be fully host-agnostic, regardless of query string.

Plan 2: Bottom up:

This will make more changes upfront, which makes them more visible and exposed to caching, but will also let us know quickly for any edge case or mistakes and increases confidence as such. One downside is that it will temporarily increase complexity in /w/static.php code.

  • Change static.php to make /w/ host-agnostic and have the same 1-year cache as today for /static/current.
  • Change Varnish VCL to simplify /w/static.php routing to be fully host-agnostic, regardless of query string.
  • Switch remaining use of /static/current to /w.
  • Remove or disable health checks for /static/current.
  • Remove prod route for /static/current (remove symlink in operations/mediawiki-config.git, remove special rewrite rule in operations/deployment-charts.git).
  • Remove k8s route for /static/current (remove rewrite rule in operations/deployment-charts.git).
  • Remove remnants such as the now-unused code in /w/static.php, and any disabled health checks.

Event Timeline

Change 765355 had a related patch set uploaded (by Krinkle; author: Krinkle):

[operations/mediawiki-config@master] static.php: Improve docs and simplify/clarify some code

https://gerrit.wikimedia.org/r/765355

Krinkle renamed this task from Deprecate /static/current at WMF in favour of similar long-cache unversioned /w/ URLs to Deprecate "/static/current" at WMF in favour of similar long-cache unversioned /w/ URLs.Feb 24 2022, 12:59 PM
Krinkle triaged this task as Medium priority.Feb 24 2022, 1:29 PM
Krinkle updated the task description. (Show Details)
Krinkle added a project: Technical-Debt.

@Joe Would be interested in your thoughts on this proposal, as well as whether you consider one implementation as more favourable or lower risk than the other.

@Krinkle If I understood correctly the proposal, I would think that the safest way to make changes is to start at the edge, and remove support for the old system last, so I like the second approach slightly more. Both seem overall valid to me though.

Change 765355 merged by jenkins-bot:

[operations/mediawiki-config@master] static.php: Improve docs and simplify/clarify some code

https://gerrit.wikimedia.org/r/765355

@dancy @Joe I'd like to run a thought by you. For much of our frontend handling in ResourceLoader, we have a built-in assumption that deployments are not atomic and thus that we cannot tell apart a "bad" URL from a URL that is simply very new (sento to a browser by a server with new code, and then responded to by a server with old code), and so for version validation we fallback to a short 1-5min cache so that things self-correct instead of poisoning long-term client- and CDN cache with old/incompatible assets under a new URL.

For static.php, however, we don't do this because it's mostly a standalone script that looks through available branches and tries to serve the correct file by filepath and md5 hash. This means that, so long we sync out new branches before version promotion, and so long as webservers each have multiple MW versions present on disk, we can in fact tell the difference between a bad URL and a new URL.

I've been talking in recent years (in context of fpm restarts T266055, opcache size T99740, and mw k8s) that we can probably achieve mw images to run only 1 version and thus operate multiversion at a routing layer instead. This isn't a goal of mine, but rather a potential improvement idea that I imagine might help with some of the aspects SRE would care more about. However this little thing here with static.php is an example where awareness on disk is something we do depend on (even if none of the PHP code would be used and thus no double php-opcache or mw-l10ncache size). I can keep this as-is for now, but I'm curious what your view on this is. Right now we get to have almost no backend traffic for gargage URLs because we discard them by caching them as strongly as possible. There is quite a lot of gargage traffic and until we did this the majority of backend traffic for static.php was garbage with a small percentage being genuine traffic (akin to Thumbor "the 404 generator that ocasionally makes a thumbnail", similarly due to strong cachability of good responses). It seems like a desirable quality to keep this, but it's also not exactly a high risk factor since they are stateless and cacheable either way. It's just a matter of whether we give it a 1-min TTL or a 1-year TTL. Right now it's 1-year because we know that it's invalid due to having other MW branches and having them on disk everywhere ahead of activation.

Change 771357 had a related patch set uploaded (by Krinkle; author: Krinkle):

[operations/mediawiki-config@master] static.php: Fold "current" handling into "nohash" and extend TTL to 1y

https://gerrit.wikimedia.org/r/771357

Krinkle updated the task description. (Show Details)

Change 771357 merged by jenkins-bot:

[operations/mediawiki-config@master] static.php: Fold "current" handling into "nohash" and extend TTL to 1y

https://gerrit.wikimedia.org/r/771357

Change 777893 had a related patch set uploaded (by Krinkle; author: Krinkle):

[operations/mediawiki-config@master] static.php: Remove peeking at current-wiki $IP

https://gerrit.wikimedia.org/r/777893

Change 777893 merged by jenkins-bot:

[operations/mediawiki-config@master] static.php: Remove peeking at current-wiki $IP

https://gerrit.wikimedia.org/r/777893

Change 777900 had a related patch set uploaded (by Krinkle; author: Krinkle):

[operations/mediawiki-config@master] static.php: Fold "unknown" handling into "nohash"

https://gerrit.wikimedia.org/r/777900

Task description
  • Change static.php to make /w/ host-agnostic.

This is now done.

Change 777904 had a related patch set uploaded (by Krinkle; author: Krinkle):

[operations/puppet@production] varnish: Expand static.php optimisation regarless of query string

https://gerrit.wikimedia.org/r/777904

Change 777901 had a related patch set uploaded (by Krinkle; author: Krinkle):

[operations/mediawiki-config@master] static.php: Restore short cache for temporary 'mismatch' response

https://gerrit.wikimedia.org/r/777901

Task description:
  • Switch remaining use of /static/current to /w.

There are now 0 matches in Codesearch Everywhere (other than health checks and static.php source).

There were 115 matches on-wiki via Global Search (query). I've added a swap pattern to Tourbot (commit), and ran it in interactive mode over the results from Global Search using my interface admin rights. In practice though, most of the matching URLs were (and remain) a 404 error as the underlying files no longer existed in our software under that name for unrelated reasons, e.g. url(/w/skins/Vector/images/bullet-icon.png) doesn't exist anymore.

Change 778295 had a related patch set uploaded (by Krinkle; author: Krinkle):

[operations/puppet@production] mediawiki: Update httpbb tests for /static/current going away

https://gerrit.wikimedia.org/r/778295

Change 778295 merged by Dzahn:

[operations/puppet@production] mediawiki: Update httpbb tests for /static/current going away

https://gerrit.wikimedia.org/r/778295

Change 778601 had a related patch set uploaded (by Krinkle; author: Krinkle):

[operations/deployment-charts@master] mediawiki: Remove route for /static/current/* (rewrite_static_assets)

https://gerrit.wikimedia.org/r/778601

Change 778602 had a related patch set uploaded (by Krinkle; author: Krinkle):

[operations/puppet@production] mediawiki: Remove unused rewrite_static_assets param

https://gerrit.wikimedia.org/r/778602

Change 779944 had a related patch set uploaded (by Krinkle; author: Krinkle):

[operations/mediawiki-config@master] static: Remove `/static/current` symlink

https://gerrit.wikimedia.org/r/779944

Change 779944 merged by jenkins-bot:

[operations/mediawiki-config@master] static: Remove `/static/current` symlink

https://gerrit.wikimedia.org/r/779944

Change 777900 merged by jenkins-bot:

[operations/mediawiki-config@master] static.php: Fold "unknown" handling into "nohash"

https://gerrit.wikimedia.org/r/777900

Change 777901 merged by jenkins-bot:

[operations/mediawiki-config@master] static.php: Restore short cache for temporary 'mismatch' response

https://gerrit.wikimedia.org/r/777901

Change 789863 had a related patch set uploaded (by Krinkle; author: Krinkle):

[operations/mediawiki-config@master] static.php: Remove unused handling of /static/current/ routes

https://gerrit.wikimedia.org/r/789863

Change 789863 merged by jenkins-bot:

[operations/mediawiki-config@master] static.php: Remove unused handling of /static/current/ routes

https://gerrit.wikimedia.org/r/789863

[operations/mediawiki-config@master] static.php: Remove unused handling of /static/current/ routes

https://gerrit.wikimedia.org/r/789863

This was tested with a url like https://test2.wikipedia.org/static/current/skins/Vector/resources/skins.vector.styles.legacy/images/user-avatar.svg which with WikimediaDebug enabled and set to k8s-experimental responded with 200 OK and a png file before this change, and with 404 afterward – matching what production does as of https://gerrit.wikimedia.org/r/779944.

Signing over to @Joe for review, and (if indeed useful and approved) deployment, of the three remaining patches I submitted to VCL config (puppet.git) and mw-on-k8s Apache config (deployment-charts.git). Feel free to forward/assign to someone else as appropriate.

Change 778602 merged by Giuseppe Lavagetto:

[operations/puppet@production] mediawiki: Remove unused rewrite_static_assets param

https://gerrit.wikimedia.org/r/778602

Change 798395 had a related patch set uploaded (by Giuseppe Lavagetto; author: Giuseppe Lavagetto):

[operations/deployment-charts@master] mediawiki: remove static assets rewrite clause.

https://gerrit.wikimedia.org/r/798395

Change 778601 merged by jenkins-bot:

[operations/deployment-charts@master] mediawiki: Remove route for /static/current/* (rewrite_static_assets)

https://gerrit.wikimedia.org/r/778601

Change 798395 abandoned by Giuseppe Lavagetto:

[operations/deployment-charts@master] mediawiki: remove static assets rewrite clause.

Reason:

Already implemented in another patch.

https://gerrit.wikimedia.org/r/798395

Change 777904 merged by Giuseppe Lavagetto:

[operations/puppet@production] varnish: Expand static.php optimisation regarless of query string

https://gerrit.wikimedia.org/r/777904

Joe updated the task description. (Show Details)
Joe removed a project: Patch-For-Review.

Everything is done as far as I can tell.

Krinkle closed this task as Resolved.
Krinkle claimed this task.