Page MenuHomePhabricator
Paste P969

text-varnish stats/musings about 2layer effects
ActivePublic

Authored by BBlack on Jul 15 2015, 4:36 PM.
Experiment:
Raised cp1065 FE Cache size from 24G to 48G, then to 96G, for several days each.
No appreciable/consistent increase in hitrate
Conclusion: Hottest hits are well-contained within 24G, remaining potential hits are very-long-tail and thus difficult to capture within reasonable RAM-sizing bounds, no point increasing.
Ganglia data aggregated over all esams text-cache nodes
(used esams because tier1 BE stats are complicated by fetches from tier2 DCs)
ESAMS:
Effective FE Cache Size: 23G (weighted avg)
FE peaks in weekly graphs:
client_req: ~30K
cache_hit: ~22K
backend_req: ~6K
(2K discrepancy likely vcl_error (true errors, redirects) + PURGE)
(~79% true hit on non-error/redirect/PURGE)
cache_miss: ~2.8K
cache_hitpass: ~2.6K
s_pass: ~3.0K
(so basically, 50/50 split on true miss vs pass, most passes are hitpasses)
(keep in mind some "miss" may not be cacheable once fetched due to cache-control...)
Effective BE Cache Size: 13000G (sum)
(even for a very-long-tail, we'd expect some new hits here due to massive size...)
BE peaks in weekly graphs:
client_req: ~10K
(4K more than FE's backend_req. We can assume some is PURGE, perhaps some is FE error-restart not counting in FE.backend_req?)
cache_hit: ~3.5K
(how can this be higher than FE cache-miss? is BE caching a few things that cache-control allows, but FE explicitly pass/hitpass-es?)
backend_req: ~3.2K
(52% true hit - which is basically 10% true-hit on top of FE's true-hit, for total reqs)
(assuming the missing 3.3K is not "real" (PURGE + error + ??))
cache_miss: ~1.8K
cache_hitpass: ~1.0K
s_pass: ~1.5K
(would expect hitpass/pass to closely track FE, which means we have some pass/hitpass logic in FE that should be in BE but isn't)
(s_pass + cache_miss =~ 3.3K, FE s_pass is 3.0K, so perhaps if they were aligned on pass/hitpass, cache_miss would be ~0.3K?)