We've seen over and over that when we have a spike in the memcached requests this causes higher latencies on the application servers.
Some keys are super hot - take for instance `WANCache:v:global:CacheAwarePropertyInfoStore:wikidatawiki:P244` which gets read about 4k times per second (!!!) - this is the wikidata item for the Library of Congress.
Mcrouter specifically allows to define a [[ https://github.com/facebook/mcrouter/wiki/Two-level-caching | Warmup Route ]] that does exactly what we want (at least on paper):
* Read from the local memcached instance
* On a miss, read from the shared pool
* If the data was present in the shared pool, set it back to the local memcached
of course, we'll have to keep a short TTL (like 10 seconds) on the local instance, but this should reduce the network traffic by a lot as some of the hottest keys would go from being requested 7k times per second to the remote server down to A* N_servers / TTL (where A is a factor comprising cache expunges/misses) times per second.
[x] Create a configuration that supports on-host memcached and puppetise it
[x] Provide metrics/dashboard for on-host memcached
[x] Test on-host memcached functionality and performance
[] Deploy in 10% of each mw* cluster (app, api, jobrunners, parsoid)
[] Deploy to 100%