Change Details

The basics of the situation look like this: 1. do_stream is not universally-good (it has negatives about slowing down concurrent requests for the same resource if the response is cacheable/shareable as opposed to "pass"), but for large objects we tend to want to turn it on to avoid taking multiple buffering delays through all of the cache layers. Therefore, in at least some places we'll want a content-length-sensitive block that turns on beresp.do_stream. 2. For objects that are even larger than the above threshold, sometimes we want to just create a hit-for-pass object and not cache them at all. This usually only applies to frontend layers, since they have smaller total cache dataset sizes, and it's not very expensive to fetch them from the immediate (local) backend caches. 3. However: some applayer backends, and sometimes varnish itself for inter-cache requests, use TE:chunked to stream out a response. This means the receiving cache doesn't get a Content-Length header in vcl_fetch on which to conditionally decide on whether to do_stream and/or hit-for-pass. Currently, code related to the above only exists on the misc and upload clusters. In the misc case, by way of how the defaulting works, chunked responses are treated as if they break all size thresholds and are very large. In the upload case we do the opposite, and only treat objects as sufficiently-large to pass checks if they do have an explicit Content-Length which is above the limit (chunked are considered small responses). Misc's behavior is a big part of what leads to the issues in T128813 , upload's behavior is also not optimal in the general case (but may be mostly-ok in practice at present there because the behavior of Swift is well-defined), and then the other services could probably use this sort of logic as well but don't have it at all. There are multiple ways we could approach solving this set of issues for the general case (and move this to shared VCL, perhaps parameterizing the size cutoffs if necessary?), and this task is about exploring them. My thoughts currently are: 1. Regardless of how we solve the "default for unknown sizes" problem, if we used the same cutoff for do_stream that we do for hit-for-pass on the frontends, we'd avoid ever causing issues with do_stream and concurrent clients (which is that if it's cacheable and fetching for do_stream, the client triggering the fetch sets the speed at which the cache fetches the object for all concurrent clients. if this is a very slow client connection, it slows down the fetch for all the faster clients which stack up in the meantime). It should be a relatively low-incidence issue (as it's only on cache refresh of this cacheable object), but it's still not ideal. This doesn't solve the issue on backends though, which wouldn't hit-for-pass all streamable objects in an ideal world... 2. We could simply never do_stream on the 'direct' (backend-most) backend caches (as they should fill fast anyways! the primary benefit of do_stream is crossing higher-latency inter-cache WAN links, and the final link from the frontend to the user), and send the length even if the varnish output is TE:chunked (as in, set resp.http.X-Size = obj.length or something along those sorts of lines), so that we can still make stream/pass decisions further up the caching chain regardless of TE:chunked. Don't know if that's possible... 3. One of the biggest question-marks in my mind right now is: when does varnish choose to do TE:chunked output in general, and how does this affect inter-cache fetches and these size checks in general today?

Basically, we should adapt the current upload VCL that deals with these issues to more-closely mirror the pattern established recently for this misc cluster in T128813 . Ideally we should resolve that just before/during the Varnish 4 transition of cache_upload. **Old Description:** The basics of the situation look like this: 1. do_stream is not universally-good (it has negatives about slowing down concurrent requests for the same resource if the response is cacheable/shareable as opposed to "pass"), but for large objects we tend to want to turn it on to avoid taking multiple buffering delays through all of the cache layers. Therefore, in at least some places we'll want a content-length-sensitive block that turns on beresp.do_stream. 2. For objects that are even larger than the above threshold, sometimes we want to just create a hit-for-pass object and not cache them at all. This usually only applies to frontend layers, since they have smaller total cache dataset sizes, and it's not very expensive to fetch them from the immediate (local) backend caches. 3. However: some applayer backends, and sometimes varnish itself for inter-cache requests, use TE:chunked to stream out a response. This means the receiving cache doesn't get a Content-Length header in vcl_fetch on which to conditionally decide on whether to do_stream and/or hit-for-pass. Currently, code related to the above only exists on the misc and upload clusters. In the misc case, by way of how the defaulting works, chunked responses are treated as if they break all size thresholds and are very large. In the upload case we do the opposite, and only treat objects as sufficiently-large to pass checks if they do have an explicit Content-Length which is above the limit (chunked are considered small responses). Misc's behavior is a big part of what leads to the issues in T128813 , upload's behavior is also not optimal in the general case (but may be mostly-ok in practice at present there because the behavior of Swift is well-defined), and then the other services could probably use this sort of logic as well but don't have it at all. There are multiple ways we could approach solving this set of issues for the general case (and move this to shared VCL, perhaps parameterizing the size cutoffs if necessary?), and this task is about exploring them. My thoughts currently are: 1. Regardless of how we solve the "default for unknown sizes" problem, if we used the same cutoff for do_stream that we do for hit-for-pass on the frontends, we'd avoid ever causing issues with do_stream and concurrent clients (which is that if it's cacheable and fetching for do_stream, the client triggering the fetch sets the speed at which the cache fetches the object for all concurrent clients. if this is a very slow client connection, it slows down the fetch for all the faster clients which stack up in the meantime). It should be a relatively low-incidence issue (as it's only on cache refresh of this cacheable object), but it's still not ideal. This doesn't solve the issue on backends though, which wouldn't hit-for-pass all streamable objects in an ideal world... 2. We could simply never do_stream on the 'direct' (backend-most) backend caches (as they should fill fast anyways! the primary benefit of do_stream is crossing higher-latency inter-cache WAN links, and the final link from the frontend to the user), and send the length even if the varnish output is TE:chunked (as in, set resp.http.X-Size = obj.length or something along those sorts of lines), so that we can still make stream/pass decisions further up the caching chain regardless of TE:chunked. Don't know if that's possible... 3. One of the biggest question-marks in my mind right now is: when does varnish choose to do TE:chunked output in general, and how does this affect inter-cache fetches and these size checks in general today?