Hey,
Filing a task so it doesn’t get missed into the weekend
01:46:28 <icinga-wm> PROBLEM - Host an-worker1132 is DOWN: PING CRITICAL - Packet loss = 100%
Doesn’t seem to have recovered so down nearly 8 hours now.
Hey,
Filing a task so it doesn’t get missed into the weekend
01:46:28 <icinga-wm> PROBLEM - Host an-worker1132 is DOWN: PING CRITICAL - Packet loss = 100%
Doesn’t seem to have recovered so down nearly 8 hours now.
Removing SRE, as this has been so far correctly routed to Data-Engineering , but please revert the SRE tag with potentially a more concrete team one (e.g. ops-eqiad ) for hardware maintenance after they (@BTullis?) have a first look, or for another assistance we can provide on our side.
Mentioned in SAL (#wikimedia-analytics) [2023-01-10T17:33:57Z] <btullis> chassis power reset on an-worker1032 (T326459)
I checked the console and there is no output, but ipmitool reports that the chassis is still powered.
I issued a chassis power reset from ipmitool.
btullis@cumin1001:~$ ipmitool -I lanplus -H "an-worker1132.mgmt.eqiad.wmnet" -U root -E shell Unable to read password from environment Password: ipmitool> chassis power status Chassis Power is on ipmitool> chassis power reset Chassis Power Control: Reset ipmitool>
Host is booting now.