Today mc1024 went down unexpectedly. Icinga alerted about HOST down.
- Connecting to mgmt console was not possible
- Power cyclingfailed
- DCops was unable to physically reboot
- The server is out of warranty and a HP and this happened before.
- There was no immediate user-facing issue because traffic failed over to the gutter pool (very good!:).
- Decom task (T272074)
- Replacement request (T272085).
This is about the config part, it is shard06 in redis::shards and it appears in mcrouter_wancache.yaml.
18:58 <+icinga-wm> PROBLEM - Host mc1024 is DOWN: PING CRITICAL - Packet loss = 100% 19:03 < mutante> !log mc1024 - attempting to power on via mgmt, went down and power down 19:06 < elukey> cmjohnson1: sorry to ping you, mc1024 in B6 went down a couple of mins ago, if you have a min can you check if the host is dead/fried?
grep -r 10.64.16.107 hieradata/common/profile/mediawiki/mcrouter_wancache.yaml: host: 10.64.16.107 hieradata/common/redis.yaml: host: 10.64.16.107