Page MenuHomePhabricator

an-worker1132 down
Closed, ResolvedPublic

Description

Hey,

Filing a task so it doesn’t get missed into the weekend

01:46:28 <icinga-wm> PROBLEM - Host an-worker1132 is DOWN: PING CRITICAL - Packet loss = 100%

Doesn’t seem to have recovered so down nearly 8 hours now.

Event Timeline

jcrespo subscribed.

Removing SRE, as this has been so far correctly routed to Data-Engineering , but please revert the SRE tag with potentially a more concrete team one (e.g. ops-eqiad ) for hardware maintenance after they (@BTullis?) have a first look, or for another assistance we can provide on our side.

Mentioned in SAL (#wikimedia-analytics) [2023-01-10T17:33:57Z] <btullis> chassis power reset on an-worker1032 (T326459)

I checked the console and there is no output, but ipmitool reports that the chassis is still powered.

I issued a chassis power reset from ipmitool.

btullis@cumin1001:~$ ipmitool -I lanplus -H "an-worker1132.mgmt.eqiad.wmnet" -U root -E shell
Unable to read password from environment
Password: 
ipmitool> chassis power status
Chassis Power is on
ipmitool> chassis power reset
Chassis Power Control: Reset
ipmitool>

Host is booting now.

BTullis claimed this task.