FQDN: an-worker1148.eqiad.wmnet
Netbox: Marked as failed in netbox https://netbox.wikimedia.org/dcim/devices/3661/
Priority: Medium (the hadoop cluster can handle some node loss, however we need to perform rolling reboots on the fleet and the reboots aren't as safe if we have one host already unreliable)
Machine can be worked on at will.
This host failed to start back up after a reboot to apply a new linux kernel; upon investigating the IPMI, there was an error message that the PERC1 battery has failed
The host has subsequently come back up, but we'd like to have you guys take a look.
There's also intermittent messages about a disk bay drive1 failure; perhaps some re-seating is needed?
Record: 32 Date/Time: 11/27/2025 08:17:11 Source: system Severity: Critical Description: Fault detected on drive 1 in disk drive bay 1. ------------------------------------------------------------------------------- Record: 33 Date/Time: 11/27/2025 08:25:41 Source: system Severity: Ok Description: Drive 1 in disk drive bay 1 is operating normally. ------------------------------------------------------------------------------- Record: 34 Date/Time: 11/27/2025 08:26:26 Source: system Severity: Critical Description: Fault detected on drive 1 in disk drive bay 1. ------------------------------------------------------------------------------- Record: 35 Date/Time: 12/03/2025 07:08:54 Source: system Severity: Ok Description: Drive 1 in disk drive bay 1 is operating normally. ------------------------------------------------------------------------------- Record: 36 Date/Time: 12/03/2025 08:39:30 Source: system Severity: Critical Description: The PERC1 battery has failed. ------------------------------------------------------------------------------- Record: 37 Date/Time: 12/04/2025 15:09:36 Source: system Severity: Critical Description: Fault detected on drive 1 in disk drive bay 1. -------------------------------------------------------------------------------



