Page MenuHomePhabricator

Use a query param for cache-busting instead of suffixing the filename
Closed, DeclinedPublic

Description

The current cache busting strategy for static assets on the portal uses a file-hash suffixed to the end of a filename, such as index-309ffl4.js. If this file is improperly cached, it could lead to a 404 page instead.

Using a cache-busting solution such as index.js?309ffl4would prevent this issue.

It should be noted that this could lead to a situation where the JS file is expired but the HTML is not, leading the HTML to load a newer version of the JS file than was intended. This however, is still preferable than having the JS file return a 404.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptFeb 22 2017, 8:49 PM

Switching from a hash suffix to a query param reduces the problem, BUT does not solve it.

  • In the hash suffix option, one can get a 404 on the .js files. However, this would not be a problem if the page was fully able to work without JavaScript, without its JavaScript assets.
  • In the query param option, one can have a JS file that is incompatible with the .html file. This can lead to conflicts (errors and exceptions) that can prevent the entire page from working properly.

In my humble opinion, the hash suffix is still a better option (more predictability: we know it works, otherwise we know how it fails), but we need to improve the fallback mechanism with T158809: Prevent javascript from hiding page content indefinitely


There is an option that works in both scenarios though:

But it wouldn't be so easy with our build setup.


In conclusion, I'm still in favor of the hash suffix.

mxn added a subscriber: mxn.EditedFeb 23 2017, 12:39 AM

A query parameter would also make changes easier to review in gerrit, if I'm not mistaken.

A query parameter would also make changes easier to review in gerrit, if I'm not mistaken.

Not really, these JS files are minified.

Per T158782, given that in case of bad deployments, the new url may be accessed by clients before the new file is deployed, using query strings would make things worse as it would result in a HTTP 200 response that will be cached indefinitely. It would require manual purging to recover. And even then, it won't correct any downstream proxies or browser caches.

There is a way to make it work, but it requires proxying the request for the static files through a service that will verify the query string with the file hash and, in case of a mismatch, it would have to return a custom Cache-Control header with a shorter max-age (e.g. 1 minute) to avoid cache poisoning.

This is the approach I ended up taking for a "wmf-config-multiversion" compatible approach to serving static files for MediaWiki. (See wmfstatic for details). However, the circumstances in ResourceLoader and MediaWiki make it difficult to create new file paths in a scalable way.

For the portals I agree with @JGirault and would also recommend against query strings.

Let's go ahead and get this fixed as best we can - knowing that changing this might have unexpected consequences. I think the approach written here seems quite reasonable: T158808#3048716.

@Gehel can you take a look and let us know your thoughts on it?

MaxSem added a subscriber: MaxSem.Feb 23 2017, 7:35 PM

I'm also in support of unique filenames as opposed to query strings. The solution to transient 404s would be to restrict their cache lifetime. Considering this static content only, and that the number of possible cache objects is very small, I would say that even 15 seconds is acceptable.

Jdrewniak closed this task as Declined.Feb 24 2017, 12:34 PM

Given all the rationale above, it seems best to leave the cache-busting filenames as is, therefore marking this task declined.

debt moved this task from Backlog to Done on the Discovery-Portal-Sprint board.Mar 22 2017, 3:44 PM