This task is to track approval and eventual decommission of mw1163. It crashed on 2017-09-05 due to a memory error. A check of the service event log via the ilom shows a LARGE number of memory failures in the same dimm slot.
This system has been out of warranty since 2016-01-30. The last repair on this system seems to have been T84399, which replaced memory and the system board. The SEL below doesn't include those failures, as it was cleared once new hardware was installed in the system.
The SEL shows:
Record: 1 Date/Time: 09/11/2014 18:55:40 Source: system Severity: Ok Description: Log cleared. ------------------------------------------------------------------------------- Record: 2 Date/Time: 10/19/2014 21:56:06 Source: system Severity: Non-Critical Description: Correctable memory error rate exceeded for DIMM_B2. ------------------------------------------------------------------------------- Record: 3 Date/Time: 10/19/2014 21:58:43 Source: system Severity: Critical Description: Correctable memory error rate exceeded for DIMM_B2. ------------------------------------------------------------------------------- Record: 4 Date/Time: 12/13/2014 07:15:50 Source: system Severity: Non-Critical Description: Correctable memory error rate exceeded for DIMM_B2. ------------------------------------------------------------------------------- Record: 5 Date/Time: 12/13/2014 07:47:15 Source: system Severity: Critical Description: Correctable memory error rate exceeded for DIMM_B2. ------------------------------------------------------------------------------- Record: 6 Date/Time: 02/07/2016 05:09:54 Source: system Severity: Non-Critical Description: Correctable memory error rate exceeded for DIMM_B2. ------------------------------------------------------------------------------- Record: 7 Date/Time: 02/07/2016 05:09:55 Source: system Severity: Critical Description: Correctable memory error rate exceeded for DIMM_B2. ------------------------------------------------------------------------------- Record: 8 Date/Time: 03/19/2016 02:22:56 Source: system Severity: Non-Critical Description: Correctable memory error rate exceeded for DIMM_A2. ------------------------------------------------------------------------------- Record: 9 Date/Time: 03/19/2016 02:22:56 Source: system Severity: Critical Description: Correctable memory error rate exceeded for DIMM_A2. ------------------------------------------------------------------------------- Record: 10 Date/Time: 04/12/2016 22:29:29 Source: system Severity: Critical Description: CPU 1 machine check error detected. ------------------------------------------------------------------------------- Record: 11 Date/Time: 04/12/2016 22:29:29 Source: system Severity: Ok Description: An OEM diagnostic event occurred. ------------------------------------------------------------------------------- Record: 12 Date/Time: 04/12/2016 22:29:29 Source: system Severity: Ok Description: An OEM diagnostic event occurred. ------------------------------------------------------------------------------- Record: 13 Date/Time: 04/12/2016 22:29:29 Source: system Severity: Ok Description: An OEM diagnostic event occurred. ------------------------------------------------------------------------------- Record: 14 Date/Time: 04/12/2016 22:29:29 Source: system Severity: Ok Description: An OEM diagnostic event occurred. ------------------------------------------------------------------------------- Record: 15 Date/Time: 04/12/2016 22:29:29 Source: system Severity: Critical Description: Multi-bit memory errors detected on a memory device at location(s) DIMM_A2. ------------------------------------------------------------------------------- Record: 16 Date/Time: 04/14/2016 01:30:37 Source: system Severity: Non-Critical Description: Correctable memory error rate exceeded for DIMM_B2. ------------------------------------------------------------------------------- Record: 17 Date/Time: 04/14/2016 01:31:13 Source: system Severity: Critical Description: Correctable memory error rate exceeded for DIMM_B2. ------------------------------------------------------------------------------- Record: 18 Date/Time: 06/30/2016 20:32:41 Source: system Severity: Non-Critical Description: Correctable memory error rate exceeded for DIMM_B2. ------------------------------------------------------------------------------- Record: 19 Date/Time: 06/30/2016 20:33:13 Source: system Severity: Critical Description: Correctable memory error rate exceeded for DIMM_B2. ------------------------------------------------------------------------------- Record: 20 Date/Time: 11/02/2016 19:18:09 Source: system Severity: Non-Critical Description: Correctable memory error rate exceeded for DIMM_B2. ------------------------------------------------------------------------------- Record: 21 Date/Time: 11/03/2016 02:17:16 Source: system Severity: Critical Description: Correctable memory error rate exceeded for DIMM_B2. ------------------------------------------------------------------------------- Record: 22 Date/Time: 11/18/2016 03:01:08 Source: system Severity: Non-Critical Description: Correctable memory error rate exceeded for DIMM_B2. ------------------------------------------------------------------------------- Record: 23 Date/Time: 11/18/2016 04:47:45 Source: system Severity: Critical Description: Correctable memory error rate exceeded for DIMM_B2. ------------------------------------------------------------------------------- Record: 24 Date/Time: 07/06/2017 01:27:45 Source: system Severity: Non-Critical Description: Correctable memory error rate exceeded for DIMM_A2. ------------------------------------------------------------------------------- Record: 25 Date/Time: 07/06/2017 01:27:45 Source: system Severity: Critical Description: Correctable memory error rate exceeded for DIMM_A2. ------------------------------------------------------------------------------- Record: 26 Date/Time: 09/05/2017 21:03:28 Source: system Severity: Critical Description: CPU 1 machine check error detected. ------------------------------------------------------------------------------- Record: 27 Date/Time: 09/05/2017 21:03:28 Source: system Severity: Ok Description: An OEM diagnostic event occurred. ------------------------------------------------------------------------------- Record: 28 Date/Time: 09/05/2017 21:03:28 Source: system Severity: Ok Description: An OEM diagnostic event occurred. ------------------------------------------------------------------------------- Record: 29 Date/Time: 09/05/2017 21:03:28 Source: system Severity: Ok Description: An OEM diagnostic event occurred. ------------------------------------------------------------------------------- Record: 30 Date/Time: 09/05/2017 21:03:28 Source: system Severity: Ok Description: An OEM diagnostic event occurred. ------------------------------------------------------------------------------- Record: 31 Date/Time: 09/05/2017 21:03:28 Source: system Severity: Critical Description: Multi-bit memory errors detected on a memory device at location(s) DIMM_A2. -------------------------------------------------------------------------------
So this system has memory failures in both slots A2 and B2. Requesting permission to remove this system from service and decommission.