Page MenuHomePhabricator

Use a query param for cache-busting instead of suffixing the filename
Closed, DeclinedPublic

Description

The current cache busting strategy for static assets on the portal uses a file-hash suffixed to the end of a filename, such as index-309ffl4.js. If this file is improperly cached, it could lead to a 404 page instead.

Using a cache-busting solution such as index.js?309ffl4would prevent this issue.

It should be noted that this could lead to a situation where the JS file is expired but the HTML is not, leading the HTML to load a newer version of the JS file than was intended. This however, is still preferable than having the JS file return a 404.

Event Timeline

Switching from a hash suffix to a query param reduces the problem, BUT does not solve it.

  • In the hash suffix option, one can get a 404 on the .js files. However, this would not be a problem if the page was fully able to work without JavaScript, without its JavaScript assets.
  • In the query param option, one can have a JS file that is incompatible with the .html file. This can lead to conflicts (errors and exceptions) that can prevent the entire page from working properly.

In my humble opinion, the hash suffix is still a better option (more predictability: we know it works, otherwise we know how it fails), but we need to improve the fallback mechanism with T158809: Prevent javascript from hiding page content indefinitely


There is an option that works in both scenarios though:

But it wouldn't be so easy with our build setup.


In conclusion, I'm still in favor of the hash suffix.

A query parameter would also make changes easier to review in gerrit, if I'm not mistaken.

A query parameter would also make changes easier to review in gerrit, if I'm not mistaken.

Not really, these JS files are minified.

Per T158782, given that in case of bad deployments, the new url may be accessed by clients before the new file is deployed, using query strings would make things worse as it would result in a HTTP 200 response that will be cached indefinitely. It would require manual purging to recover. And even then, it won't correct any downstream proxies or browser caches.

There is a way to make it work, but it requires proxying the request for the static files through a service that will verify the query string with the file hash and, in case of a mismatch, it would have to return a custom Cache-Control header with a shorter max-age (e.g. 1 minute) to avoid cache poisoning.

This is the approach I ended up taking for a "wmf-config-multiversion" compatible approach to serving static files for MediaWiki. (See wmfstatic for details). However, the circumstances in ResourceLoader and MediaWiki make it difficult to create new file paths in a scalable way.

For the portals I agree with @JGirault and would also recommend against query strings.

Let's go ahead and get this fixed as best we can - knowing that changing this might have unexpected consequences. I think the approach written here seems quite reasonable: T158808#3048716.

@Gehel can you take a look and let us know your thoughts on it?

I'm also in support of unique filenames as opposed to query strings. The solution to transient 404s would be to restrict their cache lifetime. Considering this static content only, and that the number of possible cache objects is very small, I would say that even 15 seconds is acceptable.

Given all the rationale above, it seems best to leave the cache-busting filenames as is, therefore marking this task declined.