Currently our vcl_hit subroutines return deliver without checking the object ttl/grace both at the frontend and backend layer.
The VCL built into varnish, instead, returns miss if the object lifetime is past both its ttl and ttl+grace:
sub vcl_hit { if (obj.ttl >= 0s) { // A pure unadulterated hit, deliver it return (deliver); } if (obj.ttl + obj.grace > 0s) { // Object is in grace, deliver it // Automatically triggers a background fetch return (deliver); } // fetch & deliver once we get the result return (miss); }
The difference between miss and deliver in the context of vcl_hit is as follows:
- miss: Synchronously refresh the object from the backend despite the cache hit. Control will eventually pass to vcl_miss.
- deliver: Deliver the object. If it is stale, a background fetch to refresh it is triggered.
See https://varnish-cache.org/docs/5.1/users-guide/vcl-built-in-subs.html#vcl-hit
To better understand the behavior of varnish considering our unconditional deliver in vcl_miss we have written the following test case:
varnishtest "return deliver in vcl_hit (using grace)" server s1 { rxreq txresp -hdr "Cache-Control: s-maxage=2" -hdr "Last-Modified: Tue, 03 Apr 2018 21:35:51 +0200" -hdr "X-Resp-ID: 1" -body "Test 1" rxreq txresp -hdr "Cache-Control: s-maxage=2" -hdr "Last-Modified: Tue, 03 Apr 2018 21:35:51 +0200" -hdr "X-Resp-ID: 2" -body "Test 2" } -start varnish v1 -vcl+backend { sub vcl_backend_response { set beresp.grace = 5s; set beresp.keep = 0s; } sub vcl_hit { return(deliver); } } -start # First request, cache miss client c1 { txreq rxresp expect resp.http.X-Resp-ID == 1 } -run varnish v1 -expect cache_hit == 0 varnish v1 -expect cache_miss == 1 # Second request, cache hit client c2 { txreq rxresp expect resp.http.X-Resp-ID == 1 } -run varnish v1 -expect cache_hit == 1 varnish v1 -expect cache_miss == 1 # Third request waiting 3s to make the 2s ttl expire. This will result in a # bgfetch and the stale object being returned. Still considered a cache hit. client c3 { delay 3 txreq rxresp expect resp.http.X-Resp-ID == 1 } -run varnish v1 -expect cache_hit == 2 varnish v1 -expect cache_miss == 1 # Fourth request. This one returns the second object and is considered # a cache hit as well. client c4 { txreq rxresp expect resp.http.X-Resp-ID == 2 } -run varnish v1 -expect cache_hit == 3 varnish v1 -expect cache_miss == 1 varnish v1 -expect n_expired == 0
If we use keep instead of grace in vcl_backend_response, the test behaves exactly the same (not only in that it passes, but in the sense that varnishtest -v yields the same output except for timestamps/port numbers):
sub vcl_backend_response { set beresp.grace = 0s; set beresp.keep = 5s; }
It thus seems that the distinction between grace and keep is not somehow internal to varnish. Rather, it needs to be implemented via VCL.
Because of this, our CDN is presently making keep behave as grace.
- We know that the expiry thread is often unable to cope with load on our varnish backends. When that happens, objects stick around in cache well past their ttl+grace+keep lifetime. Do those also behave like graced objects?
- Only one request is considered a cache miss in the vtc test above, although there's been two full fetches from the origin server. Does that mean that our stats are currently skewed towards hit?
The test behavior changes by delaying the origin server response. In that case, request number 4 occurs while the varnish<->origin bgfetch is in flight, and it results in the first object being returned.
--- x.vtc 2018-04-18 15:42:57.704383375 +0200 +++ y.vtc 2018-04-18 15:45:11.499248078 +0200 @@ -5,6 +5,7 @@ txresp -hdr "Cache-Control: s-maxage=2" -hdr "Last-Modified: Tue, 03 Apr 2018 21:35:51 +0200" -hdr "X-Resp-ID: 1" -body "Test 1" rxreq + delay 1 txresp -hdr "Cache-Control: s-maxage=2" -hdr "Last-Modified: Tue, 03 Apr 2018 21:35:51 +0200" -hdr "X-Resp-ID: 2" -body "Test 2" } -start @@ -51,12 +52,12 @@ varnish v1 -expect cache_hit == 2 varnish v1 -expect cache_miss == 1 -# Fourth request. This one returns the second object and is considered -# a cache hit as well. +# Fourth request. This one returns the first object while the bgfetch from the +# origin server is in progress. This is considered a cache hit as well. client c4 { txreq rxresp - expect resp.http.X-Resp-ID == 2 + expect resp.http.X-Resp-ID == 1 } -run varnish v1 -expect cache_hit == 3
That seems to indicate that we might be responding with (very, if the expiry mailbox theory is confirmed) stale objects to quite a few requests, while the bgfetch is in progress.