For about 11 hours now, there's an Icinga alert for the no-longer redundant power supply of wdqs2006:
Sensor Type(s) Temperature, Power_Supply Status: Critical [PS Redundancy = Critical, Status = Critical]
For about 11 hours now, there's an Icinga alert for the no-longer redundant power supply of wdqs2006:
Sensor Type(s) Temperature, Power_Supply Status: Critical [PS Redundancy = Critical, Status = Critical]
Strange... according to T188432 the new wdqs servers are wdqs100[7-9]. The current wdqs cluster in eqiad is [[ https://github.com/wikimedia/puppet/blob/production/manifests/site.pp#L2115-L2117 | wdqs100[3-5] ]]. So what is this wdqs1006?
Another check in Icinga, the issue is actually on wdqs2006, which is one of the new wdqs server in codfw (T187800). I'm assigning this to @Papaul instead of @Cmjohnson.