Page MenuHomePhabricator

db1164 fails to POST/boot/etc
Closed, ResolvedPublic

Description

I tried to reimage db1164 as part of T303171, but it never came back up. Connecting to the mgmt interface, i see nothing on the console, even after doing racadm serveraction hardreset, and racadm racreset.

Event Timeline

On the web console, i can see it get as far as this before resetting:

image.png (1×1 px, 52 KB)

This is the sign of a failed DIMM, during post it's failing during the checking memory phase. I attempted to reboot the system to "self-heal" but that failed, The SEL shows. I will request a DIMM replacement from Dell.


Record: 2
Date/Time: 01/22/2022 11:18:41
Source: system
Severity: Non-Critical

Description: The memory health monitor feature has detected a degradation in the DIMM installed in DIMM_A8. Reboot system to initiate self-heal process.

racadm>>

fgiunchedi triaged this task as Medium priority.Apr 29 2022, 2:02 PM

Dell ticket number
You have successfully submitted request SR1092770830.

Cmjohnson claimed this task.

DIMM replaced and booted into the OS, I was able to update the firmware while it was offline.