Page MenuHomePhabricator

an-presto1004 down
Open, Needs TriagePublic

Description

an-presto1004 seems completely unresponsive, I tried a powercycle but it ended up in a timeout :(

-------------------------------------------------------------------------------
Record:      14
Date/Time:   05/23/2020 03:32:18
Source:      system
Severity:    Critical
Description: CPU 1 has a thermal trip (over-temperature) event.
-------------------------------------------------------------------------------

Event Timeline

elukey created this task.Sat, May 23, 8:17 AM
Restricted Application added a project: Operations. · View Herald TranscriptSat, May 23, 8:17 AM

I submitted a ticket with Dell for a replacement CPU. SR1025619583

elukey moved this task from Incoming to Radar on the Analytics board.Tue, May 26, 7:35 AM

There is a larger issue with this server, replaced the CPU but noticed the power supplies are both failed. I could also smell burning in the server, swapped the power supplies with decom spares and they psu's started smoking and something definitely burned inside the power supply. This will be down for awhile