**Overview**
Our object caching service is based on mcrouter and memcached. Mcrouter is a memcached protocol router for scaling memcached. Currently, each mediawiki server is running an instance of mcrouter which in turn is configured with the same pool of memcached servers that constitute our pool.
**Current Issues**
* When a shard becomes unavailable, we get TKOs which cause latency problems T203786 T208934 T239983
* All memcached servers are on Debian Jessie, with its LTS support ending in June 2020
* Redis is co-located in the same set of servers
To address the above issues, we will initially introduce a secondary pool of memcached servers, called gutter servers, capable of temporarily replacing any unavailable servers. This functionality will be provided by mcrouter. When we have the gutter servers in place and failover works, we can proceed with rolling upgrading all memcached servers to Debian buster. Since there have not been any major changes in the memcached protocol, we do not expect any major issues.
Another thing to take into account is mcrouter proxies. We have 4 mw servers in each DC which are used to replicate specific keys (dictated by mediawiki) from one datacentre to the other. We want to test the gutter pool functionality on the proxy level, i.e. define a secondary set of mcrouter proxy servers on each DC, where mcrouter will failover to in case a primary proxy is unavailable.
Lastly, developers are already working on completely retiring the use of Redis in Mediawiki, thus there will be no need to worry about its upgrade. (TBA links to related tasks)
**Action Plan**
[x] Test gutter pool servers in beta
[x] Test new memcached settings in beta
[x] Image 6 new gutter servers (3 in eqiad, 3 in codfw)
[x] Make relevant puppet changes to get gutter pool metrics
[x] Make relevant puppet changes to support memcached on Debian Buster
[x] Test gutter pool in production (mwdebug*)
[x] Test proxy gutter pool in eqiad and/or codfw
[x] Make relevant puppet changes to support the gutter pool configuration
[x] Enable and test the gutter pool in canaries
[x] Test memcached 1.5.x (buster) in canaries
[] Enable and test the gutter pool in production
[] Roll upgrade to buster in secondary datacente (codfw)
[] Roll upgrade to buster in secondary datacente (eqiad)
Related tasks: T203786
Reads:
* https://github.com/facebook/mcrouter/wiki/Shadowing-setup