We need a cookbook to help us reboot/restart the [[ https://wikitech.wikimedia.org/wiki/Memcached_for_MediaWiki | memcached ]] cluster.
====Infrastructure Overview
* Each DC contains 18 memcached hosts and 3 gutter pool hosts
* Memcached cluster capacity: 2.4TB of available RAM
* Gutter pool capacity: ~768GB
====Performance Impact
* Both a daemon restart and a server reboot results in loss of all data
* When a server goes offline, the gutter pool cluster replaces it until the host becomes available again
* The gutter pool is always **cold**
* MediaWiki works significantly harder to warm up a cold host or gutter pool
* This translates to increased latency, additional database queries, etc
====Requirements
Create a cookbook that implements the following:
* Accepts a range of servers or a single server
* Batch size: restart/reboot no more than 2 hosts at a time
* this can be configurable ofc, with a warning about when we reboot/restart more than 2
* Warm-up period: after hosts are back online, wait for a specified duration to allow the cache to warm up before proceeding to the next pair
* we could monitor potentially the cache hit ratio?
* Ensure that we restart hosts either from the main cluster or the gutterpool cluster, but never both
* Operation modes: include separate flags for:
* Restarting the daemon
* Rebooting the server