While thanos in-memory caching has worked so far, one consequence is that on process OOM we start with an empty cache.
We have extensive memcache experience and puppetization, thus it shouldn't be a whole lot of work to move thanos caching to memcache
- Deploy memcache to titan hosts (size tbd)
- Switch thanos-query-frontend to use memcache on localhost
- Also move thanos-store cache to memcache
As followups/improvements:
- Investigate if thanos memcache client does the right thing with multiple servers (i.e. handles failure and sharding)
- If it does, then we can add all titan hosts to thanos memcache configuration
- Consider upgrading titan host memory for memcache