Page MenuHomePhabricator

thumb.php should not set CC:no-cache on renderer 404 responses?
Open, MediumPublic

Description

We've found that when the renderers emit a 404 (e.g. one with content like: Error generating thumbnail The source file 'foo.jpg' does not exist.), they tack on a Cache-Control: no-cache header before emitting it to Varnish. This prevents Varnishes from caching the 404 at all, which means all such traffic cuts through all layers. Varnish already has generic code which limits all 4xx TTLs to 10 minutes if they're longer than that. In other such scenarios, we've considered 10 minute 404s to be an acceptable tradeoff (e.g. on creation of new resources) so that we don't spam the backend so hard if a 404 becomes popular. Is there a reason caching them is bad in the upload case?

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

AFAICS it is thumb_handler.php from MW generating CC: no-cache on 404s and then proxied back to varnish by swift's rewrite.py, e.g. on mw1293 for http://commons.wikimedia.org/w/thumb_handler.php/9/91/1235552102_KBS_World_-_Logotype_%28JPG%29.jpg/260px-1235552102_KBS_World_-_Logotype_%28JPG%29.jpg

HTTP/1.1 404 Not Found
Date: Fri, 04 Nov 2016 15:43:26 GMT
Server: mw1293.eqiad.wmnet
X-Powered-By: HHVM/3.12.7
X-Content-Type-Options: nosniff
Cache-control: no-cache
X-MW-Thumbnail-Renderer: mw1293
P3P: CP="This is not a P3P policy! See https://commons.wikimedia.org/wiki/Special:CentralAutoLogin/P3P for more info."
Content-Length: 640
Backend-Timing: D=27932 t=1478274206670447
Connection: close
Content-Type: text/html; charset=utf-8

The reason for that though I'm not sure, I noticed it is also emitted on 400s for generic thumbnailing errors so it might be the same code path.

When I hit a renderer directly, I get:

bblack@cp1099:~$ curl "http://rendering.svc.eqiad.wmnet/wikipedia/commons/thumb/6/63/Taissa-Farmiga--2014-Primetime-Emmy-Awards--06_%281%29.jpg/720px-Taissa-Farmiga--2014-Primetime-Emmy-Awards--06_%281%29.jpg" --header "Host: upload.wikimedia.org" --header 'X-Forwarded-Proto: https' -i 
HTTP/1.1 404 Not Found
Date: Fri, 04 Nov 2016 16:37:07 GMT
Server: mw1294.eqiad.wmnet
X-Powered-By: HHVM/3.12.7
Cache-Control: s-maxage=2678400, max-age=2678400
Backend-Timing: D=1393 t=1478277427897345
Transfer-Encoding: chunked
Content-Type: text/html; charset=utf-8

<!DOCTYPE html>
<html>
	<head>
		<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
		<title>Wikimedia page not found: https://upload.wikimedia.org/wikipedia/commons/thumb/6/63/Taissa-Farmiga--2014-Primetime-Emmy-Awards--06_%281%29.jpg/720px-Taissa-Farmiga--2014-Primetime-Emmy-Awards--06_%281%29.jpg</title>
...

What does Swift itself hit if not that?

Ignore the above comment, the URL is wrong. What @fgiunchedi pasted is right, you just have to connect to rendering.svc.eqiad.wmnet while using the correct host-header for commons to simulate what Swift would see.

BBlack renamed this task from Swift should not set CC:no-cache on renderer 404 responses? to thumb_handler.php should not set CC:no-cache on renderer 404 responses?.Nov 4 2016, 5:10 PM
BBlack updated the task description. (Show Details)

Title/desc fixed up to not implicate Swift :)

Change 423881 had a related patch set uploaded (by Ema; owner: Ema):
[mediawiki/core@master] thumb.php: make 404s cacheable

https://gerrit.wikimedia.org/r/423881

ema renamed this task from thumb_handler.php should not set CC:no-cache on renderer 404 responses? to thumb.php should not set CC:no-cache on renderer 404 responses?.Apr 4 2018, 10:57 AM

Change 423881 abandoned by Ema:
thumb.php: make 404s cacheable

https://gerrit.wikimedia.org/r/423881

The swap of Traffic for Traffic-Icebox in this ticket's set of tags was based on a bulk action for all such tickets that haven't been updated in 6 months or more. This does not imply any human judgement about the validity or importance of the task, and is simply the first step in a larger task cleanup effort. Further manual triage and/or requests for updates will happen this month for all such tickets. For more detail, have a look at the extended explanation on the main page of Traffic-Icebox . Thank you!