Hi Papaul!
I see multiple nodes from Rack C2 reported down by icinga, anything happened to it? Maybe PSUs-related?
if you are a service owner of any of the servers listed below please check the box if you are able to depool the server on April 27th before 10:30am CT ,time set to replace the faulty switch. This will take approximately 1 hour or lest. Thanks
The servers below are just the once in rack C2: see https://netbox.wikimedia.org/dcim/devices/?rack_id=60
- ms-be -- ok to loose connectivity for a while, no depool @fgiunchedi
- moss -- not in service yet @fgiunchedi
- kafka logging -- not in service yet (T279342) @fgiunchedi
- elastic -- Search team will monitor during switch replacement; no need to depool/ban from es cluster before replacement
- dns - @BBlack - can do, Traffic needs to manual-depool before outage
- cp - @BBlack - can do, Traffic needs to manual-depool before outage
- lvs - @BBlack - can do, Traffic needs to manual-depool before outage
https://wikitech.wikimedia.org/wiki/Incident_documentation/2021-04-06_partial_rack_codfw