Notable fixes we care about in the 3.0.6 -> 3.0.7 diffs:
- gzip + streaming + http/1.0 fix: https://www.varnish-cache.org/trac/ticket/1627
- fixed memory leak in bans (which is different than purges - this applies to our manual bans)
Notable fixes we care about in the 3.0.6 -> 3.0.7 diffs:
Subject | Repo | Branch | Lines +/- | |
---|---|---|---|---|
Revert "cache_misc: disable do_gzip" | operations/puppet | production | +1 -1 |
FYI this is blocked up a little: the merge of the 3.0.6 -> 3.0.7 diffs onto our custom "-plus-wm" tree didn't go cleanly. There's a short but hairy conflict between our varnish-plus streaming+chunked stuff and the gzip+streaming+http/1.0 content-length bugfix, and it's going to take some digging to unravel how to merge them sanely without breaking anything.
Note T127294 was fixed by disabling do_gzip on the cache_misc cluster. I suspect that there was a bad interaction there with streamed pass-mode response from the backend using chunked+gzip. The ultimate trigger for making it worse was turning on nginx keepalives, but they're working fine elsewhere. I strongly suspect that https://www.varnish-cache.org/trac/ticket/1627 would've fixed this too, but we still don't have a 3.0.7 package that includes it...
Dropping some breadcrumbs in case I end up dropping this topic for a while again....
The 3.0.7 fix we care about the most, which is also the only one that creates a merge conflict onto our hacky 3.0-plus branches is this one for gzip: https://www.varnish-cache.org/trac/changeset/72981734a141a0a52172b85bae55f8877f69ff42
When merging that into our plus branch, this is the resulting merge conflict:
<<<<<<< HEAD if (!sp->wantbody) sp->wrk->res_mode &= ~RES_CHUNKED; if (!(sp->wrk->res_mode & RES_CHUNKED) && sp->wrk->h_content_length != NULL && sp->wantbody && *phigh == -1) { http_Unset(sp->wrk->resp, H_Content_Length); ======= http_Unset(sp->wrk->resp, H_Content_Length); if (sp->wrk->res_mode & RES_LEN) { AN(sp->wrk->h_content_length); >>>>>>> varnish-3.0.7
Or another way to think of the problem is that, prior to this new gzip patch, the diff from upstream to our plus branch in this little area is this:
+ if (!sp->wantbody) + sp->wrk->res_mode &= ~RES_CHUNKED; + if (!(sp->wrk->res_mode & RES_CHUNKED) && - sp->wrk->h_content_length != NULL) { + sp->wrk->h_content_length != NULL && + sp->wantbody && + *phigh == -1) { http_Unset(sp->wrk->resp, H_Content_Length); http_PrintfHeader(sp->wrk, sp->fd, sp->wrk->resp, "Content-Length: %s", sp->wrk->h_content_length);
Because the gzip fix makes a real structural change to how the code operates, it's difficult to resolve this without fully understanding how everything related works, which is a whole of things to understand.
I spent considerable time on this today, but I still wasn't able to understand the code well enough to be confident in a possible merge solution (although I came up with a few different ones that seem plausible). I think for now, perhaps we'll just leave do_gzip disabled on the misc cluster (not the end of the world) and hope that Varnish 4 moves us past all related issues. It's expected to land on maps and misc first, with one of those happening circa EOQ.
Change 273013 had a related patch set uploaded (by BBlack):
Revert "cache_misc: disable do_gzip"
At this point we're not investing further in varnish3 except for security fixes, so there's no point pursuing a real 3.0.7 package upgrade given the issues with the gzip/content-length part.