Page MenuHomePhabricator

compressed http responses without content-length not cached by varnish
Closed, DuplicatePublic

Description

on friday 15th bromine started having load issues, investigation showed a big (700k) js asset not being cached (all varnish pass) being the culprit and driving bromine's load up and making 15.wikipedia.org slow to load

further investigation revealed a problem when apache was serving gzip responses without content-length to varnish, making them uncacheable.

the issue was fixed for bromine by @ori with https://gerrit.wikimedia.org/r/#/c/264315/1/modules/annualreport/files/15.wikipedia.org though more apaches are potentially affected

Event Timeline

fgiunchedi raised the priority of this task from to Needs Triage.
fgiunchedi updated the task description. (Show Details)
fgiunchedi added a project: SRE.
fgiunchedi added subscribers: fgiunchedi, ori.

Yeah copying a bit from IRC discussion at the time: basically anytime apache's configured to gzip stuff, it's going to have a deflate buffer size limit configured, and when the object is larger than that setting, it's going to send it as a streamed chunked response rather than with a Content-Length, and then Varnish doesn't cache it. Disabling gzip completely is easiest: we don't care that much about dc-internal bandwidth, and varnish can do the compression for the end-user.

Probably we should look at (a) refactoring puppetization of apache in general so it's easy to disable gzip compression and (b) making that the default, since so many of our apaches will be behind varnishes commonly.

you could call it just an Apache config issue instead of varnish, but tagged "Traffic" anyways, feel free to remove it again if you think it shouldn't be

Something that might be interesting:

https://httpd.apache.org/docs/2.4/mod/event.html#how-it-works

Disabling mod_deflate could be good if we plan to test/try/upgrade/etc.. Apache's mpm from worker to event.

EDIT: the mod_deflate restriction is only for corner cases in which the clients block while the worker thread is generating the response.

still pending an audit of what varnish backends might be affected, particularly apache

We actually dug further into related issues when investigating WDQS woes on cache_misc, and the problem is different than what we thought we understood in this ticket. It's not that varnish is incapable of caching a TE:chunked (no Content-Length header) response, it's that our misc-web VCL in particular is explicitly configured to not do so, in an obscure way that was difficult to understand. All of this will go away with cache_misc's imminent upgrade to Varnish 4 though...