We need a cookbook to help us reboot/restart the memcached cluster.
Infrastructure Overview
- Each DC contains 18 memcached hosts and 3 gutter pool hosts
- Memcached cluster capacity: 2.4TB of available RAM
- Gutter pool capacity: ~768GB
Performance Impact
- Both a daemon restart and a server reboot results in loss of all data
- When a server goes offline, the gutter pool cluster replaces it until the host becomes available again
- The gutter pool is always cold
- MediaWiki works significantly harder to warm up a cold host or gutter pool
- This translates to increased latency, additional database queries, etc
Requirements
Create a cookbook that implements the following:
- Accepts a range of servers or a single server
- Batch size: restart/reboot no more than 2 hosts at a time
- this can be configurable ofc, with a warning about when we reboot/restart more than 2
- Warm-up period: after hosts are back online, wait for a specified duration to allow the cache to warm up before proceeding to the next pair
- we could monitor potentially the cache hit ratio?
- Ensure that we restart hosts either from the main cluster or the gutterpool cluster, but never both
- Operation modes: include separate flags for:
- Restarting the daemon
- Rebooting the server