Page MenuHomePhabricator

Static image files from en.m.wikipedia.org are served with cache-suppressing headers
Closed, ResolvedPublic

Description

Every time you navigate from one page to another on the mobile site, the following URL is revalidated or re-requested:

https://en.m.wikipedia.org/images/mobile/wikipedia-wordmark-en.png

That is, the client sends a request with IMS header, and the server responds with a 304. This is a waste of client resources.

Apache sends this file with an Expires header for one month in the future, and "Cache-Control: max-age=2592000" . Then the mobile varnish configuration replaces the CC header with "Cache-Control: private, s-maxage=0, max-age=0, must-revalidate". Here is the relevant varnish configuration:

	if (resp.http.Cache-Control ~ "s-maxage=[1-9]") {
		set resp.http.Cache-Control = "s-maxage=300, must-revalidate, max-age=0";
	} else {
		set resp.http.Cache-Control = "private, s-maxage=0, max-age=0, must-revalidate";
	}

The Cache-Control header from mod_expires does not match the regex, so Varnish replaces the header with the cache-suppressing one.

This fragment of VCL code dates from the initial import into git in 2011. It was moved from the mobile-backend.inc.vcl.erb to mobile-frontend.inc.vcl.erb in a8e0b4a55b302f8feb5ac89dfaf960bc7284ccf1 , prior to that, it had the comment "Cachable files delivered to the mobile frontends should be cached for a short time only." So it may be that the point of this VCL fragment was to cache mobile views for 5 minutes, avoiding the need to purge them on update, and this was combined with a broken workalike of the CC override in text-frontend.inc.vcl.erb. That CC override only operates on wiki pages, it does not override CC headers from static image requests:

	// Override the Cache-Control header for wiki content pages
	if (req.url ~ "(?i)^/w(iki)?/.*"
		&& req.url !~ "^/wiki/Special\:Banner(Controller|ListLoader)"
		&& req.url !~ "(?i)^/w/(extensions/.*|api\.php)"
		&& req.url !~ "(?i)bcache=1") {
		set resp.http.Cache-Control = "private, s-maxage=0, max-age=0, must-revalidate";
	}

But max age in the mobile-frontend Varnish is determined by beresp.ttl, not by CC headers sent from the backend. The current situation seems to be that we cache Mobile views for a period determined by beresp.ttl, and we also send CC:s-maxage for the benefit of external caches, but this does not appear to have been an intention of the 2011 code before Mark's change.

We also have the very charming

	// Temp test
	if (req.url == "/favicon.ico" || req.url ~ "^/apple-touch-icon(-precomposed)?\.png$" || req.url ~ "^/w/index\.php\?title=.*:Gadget-.*&[0-9]+$") {
		set resp.http.Cache-Control = "s-maxage=3600, max-age=3600";
	}

I guess Mark thought that it might be good to temporarily unbreak a special case of something that was broken generally two lines up. The desktop site sends appropriate CC/Expires headers on favicon.ico without the need for such VCL hacks, I'm not sure why the same thing can't be done here. Hopefully the testing is complete now, almost two years on, and all 10 lines can be removed and replaced with something that allows client-side caching on mobile to be at least as aggressive as on desktop.

Related Objects

Event Timeline

tstarling raised the priority of this task from to Needs Triage.
tstarling updated the task description. (Show Details)
tstarling subscribed.
faidon triaged this task as Medium priority.
faidon set Security to None.

Note that while Cache-Control may wrongly include private and max-age=0, it does have an Expires header still. Which seems to be enough for web browsers to cache it regardless.

https://en.m.wikipedia.org/images/mobile/wikipedia-wordmark-en.png in (tested in Chrome 45 canary) properly becomes 304 Not Modified with no body content sent, from the second visit onward.

It'd be nice to have Varnish cache it as well though, not just web browsers. As presumably now new users hit the backends directly to retrieve the file instead of Varnish caching it.

BBlack added a subscriber: ema.

This was resolved incidentally during the mobile VCL refactoring that happened during the course of T109286 . The "questionable" VCL blocks in the mobile VCL were all removed or refactored significantly to align them with text VCL in the process, and currently both clusters run nearly-identical VCL. @ema confirmed image headers look right:

15:56 <ema> curl -s -I https://en.m.wikipedia.org/images/mobile/wikipedia-wordmark-en.png | grep Cache-Control
Cache-Control: max-age=31536000
15:57 <ema> I get a good max-age, does it mean that https://phabricator.wikimedia.org/T86993 can be closed?