Page MenuHomePhabricator

Large text objects are randomized to cache backends
Open, MediumPublic

Description

There seem to be an unintentional and very-inefficient interaction between our attempt to defend the frontend storage space from large objects at the top of wm_admission_policies, and the code in vcl_pass which randomizes the backend selection for all pass-traffic. This causes backend chashing to be ineffective for all of the large objects where it probably matters the most.

It seems like this should probably be implemented as hit-for-miss (which is how the exponential strategy works), but using the full TTL (since it's not probability-based), and probably needs to be wrapped in the same "200 miss" conditional as the exp code as well, to avoid affecting true passes? Attaching a proposed patch here, but this is a complicated area and I wouldn't be shocked at it needing further debate and analysis.

Event Timeline

BBlack created this task.Oct 20 2020, 4:05 PM
Restricted Application added a project: SRE. · View Herald TranscriptOct 20 2020, 4:05 PM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
BBlack triaged this task as Medium priority.Oct 20 2020, 4:05 PM

Change 635318 had a related patch set uploaded (by BBlack; owner: BBlack):
[operations/puppet@production] VCL: use hfm for large_objects_cutoff

https://gerrit.wikimedia.org/r/635318

ema added a comment.Oct 21 2020, 10:39 AM

Instead of using hfp vs hfm, I think we might want to distinguish between requests that definitely cannot be cached at the ats-be layer (eg: those with req.http.Authorization) and those that potentially could result in a backend hit, like large_objects_cutoff. The former should honor pass_random in vcl_pass, the latter should always chash no matter what pass_random says?

That is what I was thinking too, but I'm not sure if the VCL state diagram allows us to see that at the right point in time to make the decision or not. We'd have to store some state with the pass object somehow, at least? The flow in the particular case in question that I was observing is like this:

  1. User requests /foo/bar -> frontend cache cp1234 miss -> chash to cp9999
  2. Response from cp9999 indicates CL:500KB, so vcl_backend_response does a return(pass(beresp.ttl))
  3. Next request for /foo/bar to cp1234 -> finds hfp object -> goes through vcl_pass -> random backend cpXXXX

It would have to be during that step 3 request that we would "know" that ats-be can't cache the object either, which is tricky I think.

Another question would be whether the large object cutoff is even useful on cache_text (are there very many large objects, relative to our current huge memory amounts?), or should at least be set to a much higher value there. Right now both clusters are set to 256KB, which is probably some arbitrary past guess that isn't optimal for either one. In practice, we could make better guesses experimentally by observing whether the whole of the frontend cache memory is even being used (it might not be, if the 256KB cutoff is making the frontend storage's job too easy by being too small), or whether there's notable space-driven evictions happening in the frontend (in which case the cutoff might be too large). Probably the ideal would have most of the memory being actively used, and small but manageable spikes in space evictions, occasionally.

ema added a comment.Oct 21 2020, 12:38 PM
  1. User requests /foo/bar -> frontend cache cp1234 miss -> chash to cp9999
  2. Response from cp9999 indicates CL:500KB, so vcl_backend_response does a return(pass(beresp.ttl))
  3. Next request for /foo/bar to cp1234 -> finds hfp object -> goes through vcl_pass -> random backend cpXXXX

It would have to be during that step 3 request that we would "know" that ats-be can't cache the object either, which is tricky I think.

Mmmh, I see. In the interest of keeping it simple, I'd be tempted to only pass_random when a decision can be made at request time, and see how that goes. When it comes to size-based decisions, at the backend layer we skip the cache based on origin server information for objects larger than 1GB, and objects that big should only be seen on upload where we don't have pass_random at all.

Another question would be whether the large object cutoff is even useful on cache_text

Very good question. :)

we could make better guesses experimentally by observing whether the whole of the frontend cache memory is even being used

It is currently fully used, yes. See for example cp3054. Whether or not it contains the most useful objects, is another story.

Change 635318 abandoned by BBlack:
[operations/puppet@production] VCL: use hfm for large_objects_cutoff

Reason:
HFM negates conditional requests, which is a non-starter in this context

https://gerrit.wikimedia.org/r/635318

Change 635852 had a related patch set uploaded (by BBlack; owner: BBlack):
[operations/puppet@production] cache_text: large_objects_cutoff == backend limit

https://gerrit.wikimedia.org/r/635852

Change 635852 merged by BBlack:
[operations/puppet@production] cache_text: large_objects_cutoff == backend limit

https://gerrit.wikimedia.org/r/635852

BBlack added a comment.EditedOct 26 2020, 2:03 PM

Notes on the large increase in large_objects_cutoff from late last week:

  • Graph link: https://grafana.wikimedia.org/d/000000500/varnish-caching?viewPanel=1&orgId=1&refresh=15m&from=now-7d&to=now&var-cluster=cache_text&var-site=All&var-status=2
  • Note the effects took effect suddenly approximately 24h after deploy, which confirms the notion that HFP objects are being constantly refreshed for the full TTL as they're used.
  • Over half (nearly 2/3rds) of the backend cache hits were converted to frontend cache hits, by allowing larger objects to be cached in the front. This in turn reduces connection/request load on the ats-be layer for cache_text hit traffic.
  • However, the percentage of requests reaching the applayer stayed roughly the same.
  • This supports our intuition that backend cache space is grossly oversized in the cache_text case. The change stopped the 6-8x replication of cacheable objects >=256KB to all cache backends, yet the effective increase in backend space isn't helping to reduce applayer traffic by any meaningful margin. This in turn bolsters the case that T263291 would be a net improvement for both clusters.
ema moved this task from Triage to Bug Reports on the Traffic board.Oct 30 2020, 9:59 AM

Change 641724 had a related patch set uploaded (by BBlack; owner: BBlack):
[operations/puppet@production] cache_text: nuke_limit and large_objects_cutoff

https://gerrit.wikimedia.org/r/641724

Change 641724 merged by BBlack:
[operations/puppet@production] cache_text: nuke_limit and large_objects_cutoff

https://gerrit.wikimedia.org/r/641724