We've found that when the renderers emit a 404 (e.g. one with content like: Error generating thumbnail The source file 'foo.jpg' does not exist.), they tack on a Cache-Control: no-cache header before emitting it to Varnish. This prevents Varnishes from caching the 404 at all, which means all such traffic cuts through all layers. Varnish already has generic code which limits all 4xx TTLs to 10 minutes if they're longer than that. In other such scenarios, we've considered 10 minute 404s to be an acceptable tradeoff (e.g. on creation of new resources) so that we don't spam the backend so hard if a 404 becomes popular. Is there a reason caching them is bad in the upload case?
Description
Details
Subject | Repo | Branch | Lines +/- | |
---|---|---|---|---|
thumb.php: make 404s cacheable | mediawiki/core | master | +4 -1 |
Event Timeline
AFAICS it is thumb_handler.php from MW generating CC: no-cache on 404s and then proxied back to varnish by swift's rewrite.py, e.g. on mw1293 for http://commons.wikimedia.org/w/thumb_handler.php/9/91/1235552102_KBS_World_-_Logotype_%28JPG%29.jpg/260px-1235552102_KBS_World_-_Logotype_%28JPG%29.jpg
HTTP/1.1 404 Not Found Date: Fri, 04 Nov 2016 15:43:26 GMT Server: mw1293.eqiad.wmnet X-Powered-By: HHVM/3.12.7 X-Content-Type-Options: nosniff Cache-control: no-cache X-MW-Thumbnail-Renderer: mw1293 P3P: CP="This is not a P3P policy! See https://commons.wikimedia.org/wiki/Special:CentralAutoLogin/P3P for more info." Content-Length: 640 Backend-Timing: D=27932 t=1478274206670447 Connection: close Content-Type: text/html; charset=utf-8
The reason for that though I'm not sure, I noticed it is also emitted on 400s for generic thumbnailing errors so it might be the same code path.
When I hit a renderer directly, I get:
bblack@cp1099:~$ curl "http://rendering.svc.eqiad.wmnet/wikipedia/commons/thumb/6/63/Taissa-Farmiga--2014-Primetime-Emmy-Awards--06_%281%29.jpg/720px-Taissa-Farmiga--2014-Primetime-Emmy-Awards--06_%281%29.jpg" --header "Host: upload.wikimedia.org" --header 'X-Forwarded-Proto: https' -i HTTP/1.1 404 Not Found Date: Fri, 04 Nov 2016 16:37:07 GMT Server: mw1294.eqiad.wmnet X-Powered-By: HHVM/3.12.7 Cache-Control: s-maxage=2678400, max-age=2678400 Backend-Timing: D=1393 t=1478277427897345 Transfer-Encoding: chunked Content-Type: text/html; charset=utf-8 <!DOCTYPE html> <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> <title>Wikimedia page not found: https://upload.wikimedia.org/wikipedia/commons/thumb/6/63/Taissa-Farmiga--2014-Primetime-Emmy-Awards--06_%281%29.jpg/720px-Taissa-Farmiga--2014-Primetime-Emmy-Awards--06_%281%29.jpg</title> ...
What does Swift itself hit if not that?
Ignore the above comment, the URL is wrong. What @fgiunchedi pasted is right, you just have to connect to rendering.svc.eqiad.wmnet while using the correct host-header for commons to simulate what Swift would see.
Change 423881 had a related patch set uploaded (by Ema; owner: Ema):
[mediawiki/core@master] thumb.php: make 404s cacheable
The swap of Traffic for Traffic-Icebox in this ticket's set of tags was based on a bulk action for all such tickets that haven't been updated in 6 months or more. This does not imply any human judgement about the validity or importance of the task, and is simply the first step in a larger task cleanup effort. Further manual triage and/or requests for updates will happen this month for all such tickets. For more detail, have a look at the extended explanation on the main page of Traffic-Icebox . Thank you!