Page MenuHomePhabricator

db2173 HW errors
Closed, ResolvedPublic

Description

db2173 crashed past Saturday with the following errors on its ILO:

-------------------------------------------------------------------------------
Record:      6
Date/Time:   11/12/2022 17:40:28
Source:      system
Severity:    Critical
Description: CPU 1 MEM345 VTT PG voltage is outside of range.
-------------------------------------------------------------------------------
Record:      7
Date/Time:   11/12/2022 17:40:28
Source:      system
Severity:    Critical
Description: CPU 1 MEM345 VPP PG voltage is outside of range.
-------------------------------------------------------------------------------
Record:      8
Date/Time:   11/12/2022 17:41:47
Source:      system
Severity:    Critical
Description: The system board Pfault fail-safe voltage is outside of range.
-------------------------------------------------------------------------------

Event Timeline

I am trying to poweron the host but it is not working:

racadm>>serveraction powerup
Server power operation initiated successfully
racadm>>serveraction powerstatus
Server power status: OFF

I have a case open with Dell

Service Request: 1115331653

@Marostegui main board replaced. The server is back up running. Sorry it took this long to get this fix.

Thanks

Change 864328 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] db2173: Enable notifications

https://gerrit.wikimedia.org/r/864328

Change 864328 merged by Marostegui:

[operations/puppet@production] db2173: Enable notifications

https://gerrit.wikimedia.org/r/864328

Host being repooled automatically.