msw-c6-codfw appears to be offline, causing the mgmt interfaces of everything in that rack to also go offline. The groups for each affected host have been tagged, since this can affect their maintenance and use of the systems. (They can remove their project tags if they don't wish them to remain.)
This should be a fairly quick fix. Hopefully the netgear isn't bad. If it is, set it aside (it has a lifetime warranty), and use a spare EX4200 for now in its place. Detailed directions below.
Please note that icinga alerted about all the hosts in the rack losing mgmt interface connectivity:
14:23 < icinga-wm> : PROBLEM - Host ps1-c6-codfw is DOWN: PING CRITICAL - Packet loss = 100% 14:24 < icinga-wm> : PROBLEM - Host ms-be2015.mgmt is DOWN: PING CRITICAL - Packet loss = 100% 14:24 < icinga-wm> : PROBLEM - Host db2043.mgmt is DOWN: PING CRITICAL - Packet loss = 100% 14:26 < icinga-wm> : PROBLEM - Host db2039.mgmt is DOWN: PING CRITICAL - Packet loss = 100% 14:27 < icinga-wm> : PROBLEM - Host db2033.mgmt is DOWN: PING CRITICAL - Packet loss = 100% 14:27 < icinga-wm> : PROBLEM - Host db2035.mgmt is DOWN: PING CRITICAL - Packet loss = 100% 14:27 < icinga-wm> : PROBLEM - Host db2037.mgmt is DOWN: PING CRITICAL - Packet loss = 100% 14:27 < icinga-wm> : PROBLEM - Host db2036.mgmt is DOWN: PING CRITICAL - Packet loss = 100% 14:27 < icinga-wm> : PROBLEM - Host db2044.mgmt is DOWN: PING CRITICAL - Packet loss = 100% 14:27 < icinga-wm> : PROBLEM - Host db2038.mgmt is DOWN: PING CRITICAL - Packet loss = 100% 14:27 < icinga-wm> : PROBLEM - Host db2041.mgmt is DOWN: PING CRITICAL - Packet loss = 100% 14:27 < icinga-wm> : PROBLEM - Host db2042.mgmt is DOWN: PING CRITICAL - Packet loss = 100% 14:27 < icinga-wm> : PROBLEM - Host db2040.mgmt is DOWN: PING CRITICAL - Packet loss = 100% 14:27 < icinga-wm> : PROBLEM - Host db2047.mgmt is DOWN: PING CRITICAL - Packet loss = 100% 14:27 < icinga-wm> : PROBLEM - Host db2048.mgmt is DOWN: PING CRITICAL - Packet loss = 100% 14:27 < icinga-wm> : PROBLEM - Host db2046.mgmt is DOWN: PING CRITICAL - Packet loss = 100%
Checklist for repair:
- mgmt netgears are simplistic, it could simply be the single power supply has become unplugged, please check this first
- attempt to power cycle the netgear (remove power cable and plug it back in)
- rule out bad power cable + bad power port (try another power cable, try another power plug/port)
- if netgear is bad, the spares tracking sheet shows there are two spare EX4200 there, serials: BP0212064074 & BP0212234923. We can wipe the config on one of these and use it as a non-managed switch as msw-c6-codfw.