**Goals**
[] Test if failover works and failover strategies.
First check that mrouter failovers to the gutter servers when a shard becomes unavailable. Mcrouter has 4 failover strategies: "tko", "connect_timeout", "timeout" and "connect_error". We would like to try them all and see what works better for us.
[] Check key integrity during and after a failover
Investigate what happens with the existing keys in a shard that was unavailable and now is back online. We would like to know if it will server stale keys for instance.
[] Test how LRU behaves in buster
Memcached 1.5.x (buster) has a few changes, including how keys are evicted from memory. We would like to keep one (or more) shard down for a long period of time, have servers failover to the gutter pool ones, gather metrics and compare with our memcached 1.4.x servers.
[] Test 'gutter proxies'
In production, there are some keys that we replicate from the main datacentre to the secondary one via a set of 4 mcrouter proxies located there. We would like to have an extra set of "gutter proxies" i.e. another 4 mrouter instances, where a mcrouter from the primary DC can failover to if one of the destination proxies is down. Note that each mediawiki server is running one mcrouter instance
**Testing Environment**
* mwdebug1001: we have deployed a configuration where we instruct mcrouter to use the gutter pool when a shard fails, config.json: P10383
* We push iptables rules to block traffic to a specific or all memcached servers from the main pool, so to cause connection errors
* mc-gp100[1-3]: gutter pool servers aka gutter pool cluster, running memcached 1.5.x version on buster
* mediawiki-07 (beta): We generate traffic towards mwdebug1001 by going through a list of 90 URLs, 1 req/s
We will be blocking traffic from mwdebug -> mc* and get metrics/data in the following cases:
* block random shards in random intervals
* block a shard for a long amount of time (eg 1 hour, 2 hous, 2 days)
* block a shards for a long amount of time (eg 1 hour, 2 hous, 2 days)
* block shards for a very long amount of time (1 week)
Additionally, we will run a similar test to observer how mcrouter behaves when failing over to a secondary set of proxyes when replicating keys (aka gutter proxies)