Hello!
I had to depool and powercycle cp1087, it was reported down by icinga and indeed no ssh or mgmt serial console tty was available. This is the output of racadm getsel:
------------------------------------------------------------------------------- [61/941] Record: 146 Date/Time: 03/30/2021 03:00:44 Source: system Severity: Critical Description: CPU 1 machine check error detected. ------------------------------------------------------------------------------- Record: 147 Date/Time: 03/30/2021 03:00:44 Source: system Severity: Ok Description: An OEM diagnostic event occurred. ------------------------------------------------------------------------------- [..] ------------------------------------------------------------------------------- Record: 155 Date/Time: 03/30/2021 02:04:04 Source: system Severity: Ok Description: A problem was detected related to the previous server boot. ------------------------------------------------------------------------------- Record: 156 Date/Time: 03/30/2021 02:04:04 Source: system Severity: Critical Description: Multi-bit memory errors detected on a memory device at location(s) DIMM_A6. ------------------------------------------------------------------------------- Record: 157 Date/Time: 03/30/2021 02:04:04 Source: system Severity: Critical Description: CPU 1 machine check error detected. ------------------------------------------------------------------------------- Record: 158 Date/Time: 03/30/2021 02:04:04 Source: system Severity: Ok Description: An OEM diagnostic event occurred. ------------------------------------------------------------------------------- [..] ------------------------------------------------------------------------------- Record: 165 Date/Time: 03/30/2021 02:04:05 Source: system Severity: Ok Description: An OEM diagnostic event occurred.
I'll leave the next steps to the Traffic team :)