From the logs, it seems that a processor failed on 2018-06-24T16:24:58 (UTC), leading to a system crash, requiring a forced restart:
$ ipmi-sel ... 10 | Jun-24-2018 | 16:24:58 | CPU Machine Chk | Processor | transition to Non-recoverable ; OEM Event Data2 code = 01h ; OEM Event Data3 code = 00h 11 | Jun-24-2018 | 16:24:58 | MSR Info Log | OEM Reserved | OEM Event Offset = 09h ; OEM Event Data2 code = 04h ; OEM Event Data3 code = 00h 12 | Jun-24-2018 | 16:24:58 | MSR Info Log | OEM Reserved | OEM Event Offset = 00h 13 | Jun-24-2018 | 16:24:58 | MSR Info Log | OEM Reserved | OEM Event Offset = 0Ch 14 | Jun-24-2018 | 16:24:58 | MSR Info Log | OEM Reserved | OEM Event Offset = 00h 15 | Jun-24-2018 | 16:24:58 | MSR Info Log | OEM Reserved | OEM Event Offset = 0Ah ; OEM Event Data2 code = 04h ; OEM Event Data3 code = 00h 16 | Jun-24-2018 | 16:24:58 | MSR Info Log | OEM Reserved | OEM Event Offset = 00h 17 | Jun-24-2018 | 16:24:58 | MSR Info Log | OEM Reserved | OEM Event Offset = 08h ; OEM Event Data2 code = E7h ; OEM Event Data3 code = 0Eh 18 | Jun-24-2018 | 16:24:58 | MSR Info Log | OEM Reserved | OEM Event Offset = 00h 19 | Jun-24-2018 | 16:26:03 | CPU Machine Chk | Processor | transition to Non-recoverable ; OEM Event Data2 code = 01h ; OEM Event Data3 code = 00h 20 | Jun-24-2018 | 16:26:03 | MSR Info Log | OEM Reserved | OEM Event Offset = 09h ; OEM Event Data2 code = 04h ; OEM Event Data3 code = 00h 21 | Jun-24-2018 | 16:26:03 | MSR Info Log | OEM Reserved | OEM Event Offset = 00h 22 | Jun-24-2018 | 16:26:03 | MSR Info Log | OEM Reserved | OEM Event Offset = 0Ch 23 | Jun-24-2018 | 16:26:03 | MSR Info Log | OEM Reserved | OEM Event Offset = 00h 24 | Jun-24-2018 | 16:26:03 | Sensor #9 | Processor | IERR ; OEM Event Data2 code = 01h ; OEM Event Data3 code = 00h
Adding monitoring, not because this is related to monitoring, but because I don't know which is a good owner, so it can be decided what to do next.