wtp2005 went down, logging to the mgmt interface at the console we have:
Alert! System fatal error during previous boot Cache and Core Box, Last Level Cache Error
And in the racadm logs ( racadm getsel):
------------------------------------------------------------------------------- Record: 12 Date/Time: 07/14/2020 10:22:01 Source: system Severity: Critical Description: CPU 2 machine check error detected. ------------------------------------------------------------------------------- [...SNIP...] ------------------------------------------------------------------------------- Record: 26 Date/Time: 07/14/2020 10:23:06 Source: system Severity: Critical Description: CPU 2 has an internal error (IERR). -------------------------------------------------------------------------------
and from racadm lclog view:
-------------------------------------------------------------------------------- SeqNumber = 212 Message ID = CPU0000 Category = System AgentID = SEL Severity = Critical Timestamp = 2020-07-14 10:32:14 Message = CPU 2 has an internal error (IERR). Message Arg 1 = 2 RawEventData = 0x1A,0x00,0x02,0x8A,0x87,0x0D,0x5F,0xB1,0x00,0x04,0x07,0x09,0x6F,0xA0,0x02,0x37 FQDD = -------------------------------------------------------------------------------- SeqNumber = 211 Message ID = CPU9000 Category = System AgentID = SEL Severity = Information Timestamp = 2020-07-14 10:32:13 Message = An OEM diagnostic event occurred. RawEventData = 0x19,0x00,0x02,0x8A,0x87,0x0D,0x5F,0xB1,0x00,0x04,0xC1,0x28,0x7E,0x00,0x20,0xBE FQDD = System.Embedded.1 -------------------------------------------------------------------------------- [...SNIP...] -------------------------------------------------------------------------------- SeqNumber = 207 Message ID = CPU0704 Category = System AgentID = SEL Severity = Critical Timestamp = 2020-07-14 10:23:06 Message = CPU 2 machine check error detected. Message Arg 1 = 2 RawEventData = 0x15,0x00,0x02,0x8A,0x87,0x0D,0x5F,0xB1,0x00,0x04,0x07,0x0D,0x07,0xA6,0x02,0x37 FQDD = CPU.Socket.1 -------------------------------------------------------------------------------- [...SNIP...] -------------------------------------------------------------------------------- SeqNumber = 197 Message ID = CPU0704 Category = System AgentID = SEL Severity = Critical Timestamp = 2020-07-14 10:22:01 Message = CPU 2 machine check error detected. Message Arg 1 = 2 RawEventData = 0x0C,0x00,0x02,0x49,0x87,0x0D,0x5F,0xB1,0x00,0x04,0x07,0x0D,0x07,0xA6,0x02,0x36 FQDD = CPU.Socket.1 --------------------------------------------------------------------------------