Page MenuHomePhabricator

db2068 is misbehaving (but is depooled)
Closed, ResolvedPublic

Description

We got alerts for db2068 from nagios being unable to contact it, but according to dbctl though reports it as depooled. I tried to login to the host, but I got a key verification alert.

db2068 is a depooled s7 replica

Event Timeline

jijiki created this task.Oct 13 2019, 6:57 AM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptOct 13 2019, 6:57 AM

Thanks for filling this out.
This host had a storage crash sometime ago T180927 and it looks like it had another one:

Logs from the 13th.

description=An Unrecoverable System Error (NMI) has occurred (iLO application watchdog timeout NMI, Service Information: 0x0000002B, 0x00000000)
description=POST Error: 1789-Slot X Drive Array Disk Drive(s) Not Responding. Check cables or replace the problematic drive(s).
description=ASR Detected by System ROM

The reason why the key verification alerts started is because apparently it has started on PXE and it is trying to get itself re-imaged (which is failing as this host isn't on the allowed hosts to get re-imaged).
This host will be decommissioned today, it is clearly broken

Marostegui closed this task as Resolved.Oct 14 2019, 7:17 AM
Marostegui claimed this task.

Resolving this as the host has been labelled as broken and sent to DC-Ops for decommissioning T235399