elastic2038 went down at 9:23 today, system event log shows CPU and memory errors: (elasticsearch is fine with one node down)
------------------------------------------------------------------------------- Record: 2 Date/Time: 03/01/2019 03:21:28 Source: system Severity: Critical Description: CPU 1 machine check error detected. ------------------------------------------------------------------------------- Record: 3 Date/Time: 03/01/2019 03:21:28 Source: system Severity: Ok Description: An OEM diagnostic event occurred. ------------------------------------------------------------------------------- Record: 4 Date/Time: 03/01/2019 03:21:29 Source: system Severity: Ok Description: An OEM diagnostic event occurred. ------------------------------------------------------------------------------- Record: 5 Date/Time: 03/01/2019 03:21:29 Source: system Severity: Ok Description: An OEM diagnostic event occurred. ------------------------------------------------------------------------------- Record: 6 Date/Time: 03/01/2019 03:21:29 Source: system Severity: Ok Description: An OEM diagnostic event occurred. ------------------------------------------------------------------------------- Record: 7 Date/Time: 03/01/2019 09:23:41 Source: system Severity: Ok Description: A problem was detected related to the previous server boot. ------------------------------------------------------------------------------- Record: 8 Date/Time: 03/01/2019 09:23:41 Source: system Severity: Critical Description: Multi-bit memory errors detected on a memory device at location(s) DIMM_A2. -------------------------------------------------------------------------------