Page MenuHomePhabricator

Allow browser to cache /portal static files
Closed, ResolvedPublic5 Estimated Story Points

Description

Since the switch to Git average request count has gone up. Previously images, etc. were allowed to be cached by the browser. But the associated Apache config change (56810fdef9) added a restriction of must-revalidate and max-age=0 for no apparent reason.

<Location ~ "^/$|^/portal/">
  Header set Cache-Control "s-maxage=3600, must-revalidate, max-age=0"
</Location>

I'd recommend a value there of 24 hours or more.

At the very least the current value of server max-age 3600 (1 hour) should be set for the client max-age as well, and remove the must-revalidate instruction. This is currently forcing browsers to make new requests for all sub resources (images, js) every repeated visit.

If you want to maintain that all clients get the same cache entry form the server for the page HTML (which is a good reason, we do the same for article content), then split the <Location> tag to be separate for ^/$ (the portal entry) and ^/portal/` (the sub resources). Unique unique file names already ensure that backward-incompatible changes don't affect cached resources.

Event Timeline

Krinkle raised the priority of this task from to Needs Triage.
Krinkle updated the task description. (Show Details)
Krinkle moved this task to Untriaged on the Wikimedia-Portals board.
Krinkle subscribed.
Restricted Application added subscribers: StudiesWorld, Aklapper. · View Herald Transcript
Krinkle set Security to None.
Krinkle added a subscriber: MaxSem.

recent work has been done to give static javascript assets a hash, so /portals/*/js/*.js can be cashed indefinitely (within reason). If img assets were also adjusted to include hashes the entirety of /portals/ could have a reasonably long cache time without having any negative effect on the ability to update the portals.

Thanks @Krinkle for catching this.

@EBernhardson is right, we can cache the JS file(s) indefinitely now they include a hash.

s-maxage=86400, max-age=86400 for images sounds fine to me, and we can work on including hashes to the images asap.

i suppose an alternative to hashes for images, is just that they in some way need a unique id. Since they are not auto generated there could simply be a version number in the filename

debt triaged this task as Low priority.Feb 9 2016, 8:08 PM
debt edited a custom field.
debt moved this task from Backlog to To Discuss on the Discovery-Portal-Sprint board.

@EBernhardson actually most of the images on the portal will be auto-generated :)
We've introduced CSS background sprites into our build process to combine all the non-critical images into a single files.

The tool we use for generating sprites is called sprity and I just discovered it has a 'cachebuster' option. This option doesn't change the name of the images, what it does instead is append a sha1 hash to reference to that file in the CSS like this:

.sprite-bookshelf_icons {
  background-image: url('...sprite-bookshelf_icons.png?b74d252ad81aced10cf55d25c46ac3a038bfc5f7');
}

The CSS file is then inlined into the HTML, which is purged with this script T125478.

@Krinkle Would this option do a good job of caching the images and busting it when necessary?

@Jdrewniak: Yes.

But even in current prod, it'll be harmless to make the max-age match the s-maxage and remove must-revalidate. The currently used images are pretty stable and haven't changed in a long time. And when they do, it's fine to slowly roll out over the course of an hour. It's currently cached either way. Just only on the server instead of server and client.

@MaxSem could you help us make the following changes?

Ping...is there any action we need to make on this ticket?

Yep it is still totally valid... We need to ping @Gehel :)

I'm adding that to my personal queue. I will start working on it as soon as I have a slot open.

Thanks, @Gehel, please let us know if you have any questions!

Krinkle moved this task from Accepted to Bugs on the Wikimedia-Portals board.

Change 280204 had a related patch set uploaded (by Gehel):
Allow browser to cache /portal static files.

https://gerrit.wikimedia.org/r/280204

I'm having a look into this (I'm brand new to portals, so as usual, I'm a bit lost). The change above should be sufficient. Still some questions to make sure I understood the context correctly:

  1. I applied the new more aggressive caching rules to all portals defined in wwwportals.conf. My understanding is that all those portals are managed in the same way, so there should not be any exception on the caching configuration. Is that correct?
  2. It is unclear to me seeing the discussion above if having the same caching rules for HTML and resources is what we want or not. Current implementation ensures the same rules for both.
  3. A quick look at wikipedia.org show that we still have images which are not content addressed (Wikipedia_wordmark.png and Wikipedia-logo-v2.png). I'm not sure how often those images change (but a 1 day cache is probably not an issue).
  4. I'll probably need a hand to test and deploy this (just to make sure I do not forget something obvious for my first portal deploy...)

Change 280204 merged by Gehel:
Allow browser to cache /portal static files.

https://gerrit.wikimedia.org/r/280204

This is now deployed. Tested on all test servers (mw1017, 1099, 2017 and 2099). I will wait for the 1h varnish cache to expire and run some more tests.

It would be nice to have image resources (not sprites) to also be content addressed and increase cache still a bit more, but that's outside the scope of this task.

Varnish has started invalidating, all looks good:

gehel@durin:~$ curl -s -v https://www.wikipedia.org/ 2>&1 | grep -i cache-control
< Cache-Control: s-maxage=86400, must-revalidate, max-age=3600
gehel@durin:~$ curl -s -v https://www.wikipedia.org/portal/wikipedia.org/assets/js/abtesting-c8078be6c2.js 2>&1 | grep -i cache-control
< Cache-Control: s-maxage=86400, max-age=86400
gehel@durin:~$ curl -s -v https://www.wikipedia.org/portal/wikipedia.org/assets/img/Wikipedia_wordmark.png 2>&1 | grep -i cache-control
< Cache-Control: s-maxage=86400, max-age=86400