Apparently db2085 has crashed.
Nothing on HW logs though:
racadm>>getsel racadm getsel Record: 1 Date/Time: 02/14/2019 15:13:00 Source: system Severity: Ok Description: Log cleared. -------------------------------------------------------------------------------
At the time of the crash, the host was running a heavy alter table on enwiki.revision T239453
OS logs suggest A3 DIMM module is having issues:
Jan 19 07:11:51 db2085 kernel: [3258047.293317] EDAC MC0: 0 CE memory read error on CPU_SrcID#0_Ha#0_Chan#2_DIMM#0 (channel:2 slot:0 page:0xec2237 offset:0x980 grain:32 syndrome:0x0 - area:DRAM err_code:0000:009f socket:0 ha:0 channel_mask:4 rank:1) Jan 19 07:12:38 db2085 kernel: [3258094.261835] {12}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 4 Jan 19 07:12:38 db2085 kernel: [3258094.261840] {12}[Hardware Error]: It has been corrected by h/w and requires no further action Jan 19 07:12:38 db2085 kernel: [3258094.261843] {12}[Hardware Error]: event severity: corrected Jan 19 07:12:38 db2085 kernel: [3258094.261846] {12}[Hardware Error]: Error 0, type: corrected Jan 19 07:12:38 db2085 kernel: [3258094.261848] {12}[Hardware Error]: fru_text: A3 Jan 19 07:12:38 db2085 kernel: [3258094.261851] {12}[Hardware Error]: section_type: memory error Jan 19 07:12:38 db2085 kernel: [3258094.261854] {12}[Hardware Error]: error_status: 0x0000000000000400 Jan 19 07:12:38 db2085 kernel: [3258094.261856] {12}[Hardware Error]: physical_address: 0x0000002ed2272100 Jan 19 07:12:38 db2085 kernel: [3258094.261863] {12}[Hardware Error]: node: 0 card: 2 module: 0 rank: 1 bank: 0 row: 29985 column: 640 Jan 19 07:12:38 db2085 kernel: [3258094.261866] {12}[Hardware Error]: error_type: 2, single-bit ECC
Can you upgrade BIOS and firmwares?