Page MenuHomePhabricator

ResourceLoader should shorten cache max-age if version hash does not match
Closed, ResolvedPublic

Description

Due to race conditions such as T47877, ResourceLoader should validate the version hash computed by the client and respond with a short max-age if its own version discovered at run-time is different.

This way, during a deployment, we will avoid polluting cache indefinitely when a not-yet-updated server responds to a new url (provided to a client by an already-updated server).

It will need to reverse engineer how the loader client concatenates and combines the version hashes (e.g. sort sha1-base64 versions by module name, and run sha1-hex).

Event Timeline

Krinkle created this task.Nov 3 2015, 7:40 PM
Krinkle raised the priority of this task from to Normal.
Krinkle raised the priority of this task from Normal to Needs Triage.
Krinkle updated the task description. (Show Details)
Krinkle set Security to None.
Restricted Application added a subscriber: StudiesWorld. · View Herald TranscriptNov 3 2015, 7:40 PM
Krinkle triaged this task as High priority.Nov 19 2015, 10:06 PM
Krinkle moved this task from Inbox to Accepted: Enhancement on the MediaWiki-ResourceLoader board.
Krinkle claimed this task.Dec 3 2015, 6:26 PM
Krinkle lowered the priority of this task from High to Normal.
Krinkle removed Krinkle as the assignee of this task.Feb 23 2016, 7:47 PM
BBlack raised the priority of this task from Normal to High.Apr 27 2016, 10:25 PM
BBlack added a subscriber: BBlack.

IMHO, we need a fix for this. The race condition does happen in the real-world with frequently-fetched ResourceLoader objects. Our only recourse after the race condition is to do a one-off ban in varnish of some complex regex-matched subset of RL URLs. This bit us today in T133765 , where 30-day-long cache objects had race-losing outputs.

The situation we're in today without this fix (or broader design fixes would be good, too, but at least this fix) is basically this:

  1. We hope we don't hit the race condition on random deploys
  2. If we do, we hope we notice (in this case, a ticket popped up something like 4 days later complaining about JS errors), but don't have any way to really verify it.
  3. When we notice it, we have to do a manual varnish ban, which doesn't scale and shouldn't be a routine operation.
Krinkle claimed this task.Aug 8 2016, 8:55 PM

Change 308290 had a related patch set uploaded (by Krinkle):
resourceloader: Shorten cache expirty if 'version' query doesn't match

https://gerrit.wikimedia.org/r/308290

Change 308290 merged by jenkins-bot:
resourceloader: Shorten cache expiry if 'version' query doesn't match

https://gerrit.wikimedia.org/r/308290

Krinkle closed this task as Resolved.Sep 3 2016, 12:14 AM
Krinkle removed a project: Patch-For-Review.
Krinkle reopened this task as Open.Sep 6 2016, 7:55 PM

This still needs a solution for localStorage since Cache-Control only affects Varnish cache and the browser's HTTP cache.

When the client stores it in localStorage under the module name + version, there is no longer any ttl that applies. We should probably change the load.php response in some way that indicates client-side cache should be disabled or shortened. (I'm suggesting disabled since localStorage has no concept of ttl. And due to the way localStorage is implemented in browsers, it is rather hard to implement in a way that doesn't kill performance.)

Change 310087 had a related patch set uploaded (by Krinkle):
[WIP] mw.loader.store: Don't cache stale module responses

https://gerrit.wikimedia.org/r/310087

Change 310087 merged by jenkins-bot:
resourceloader: Don't cache stale responses in mw.loader.store

https://gerrit.wikimedia.org/r/310087

Krinkle closed this task as Resolved.Oct 11 2016, 11:17 PM