We've seen over and over that when we have a spike in the memcached requests this causes higher latencies on the application servers.
Some keys are super hot - take for instance WANCache:v:global:CacheAwarePropertyInfoStore:wikidatawiki:P244 which gets read about 4k times per second (!!!) - this is the wikidata item for the Library of Congress.
Mcrouter specifically allows to define a Warmup Route that does exactly what we want (at least on paper):
- Read from the local memcached instance
- On a miss, read from the shared pool
- If the data was present in the shared pool, set it back to the local memcached
of course, we'll have to keep a short TTL (like 10 seconds) on the local instance, but this should reduce the network traffic by a lot as some of the hottest keys would go from being requested 7k times per second to the remote server down to A* N_servers / TTL (where A is a factor comprising cache expunges/misses) times per second.
- Create a configuration that supports on-host memcached and puppetise it
- Provide metrics/dashboard for on-host memcached
- Test on-host memcached functionality and performance
- Deploy in 10% of each mw* cluster (app, api, jobrunners, parsoid)
- Deploy to 100%