Change Details

We've seen over and over that when we have a spike in the memcached requests this causes higher latencies on the application servers. Some keys are super hot - take for instance `WANCache:v:global:CacheAwarePropertyInfoStore:wikidatawiki:P244` which gets read about 4k times per second (!!!) - this is the wikidata item for the Library of Congress. Mcrouter specifically allows to define a [[ https://github.com/facebook/mcrouter/wiki/Two-level-caching | Warmup Route ]] that does exactly what we want (at least on paper): * Read from the local memcached instance * On a miss, read from the shared pool * If the data was present in the shared pool, set it back to the local memcached of course, we'll have to keep a short TTL (like 10 seconds) on the local instance, but this should reduce the network traffic by a lot as some of the hottest keys would go from being requested 7k times per second to the remote server down to A* N_servers / TTL (where A is a factor comprising cache expunges/misses) times per second. [x] Create a configuration that supports on-host memcached and puppetise it [x] Provide metrics/dashboard for on-host memcached [x] Test on-host memcached functionality and performance [x] Deploy in 10% of each mw* cluster (app, api, jobrunners, parsoid) [x] Deploy to 100%