EDAC flagged broken memory this morning. The host is out of warranty, but maybe we have a spare DIMM module from a decomissioned server?
Aug 15 03:20:03 elastic1029 kernel: [6193865.567508] mce: [Hardware Error]: Machine check events logged Aug 15 03:20:03 elastic1029 kernel: [6193865.567587] EDAC sbridge MC0: HANDLING MCE MEMORY ERROR Aug 15 03:20:03 elastic1029 kernel: [6193865.567589] EDAC sbridge MC0: CPU 0: Machine Check Event: 0 Bank 7: 8c00004000010091 Aug 15 03:20:03 elastic1029 kernel: [6193865.567589] EDAC sbridge MC0: TSC 0 Aug 15 03:20:03 elastic1029 kernel: [6193865.567590] EDAC sbridge MC0: ADDR 7bb011440 Aug 15 03:20:03 elastic1029 kernel: [6193865.567591] EDAC sbridge MC0: MISC 140408400 Aug 15 03:20:03 elastic1029 kernel: [6193865.567592] EDAC sbridge MC0: PROCESSOR 0:306e4 TIME 1534303203 SOCKET 0 APIC 0 Aug 15 03:20:03 elastic1029 kernel: [6193865.567607] EDAC MC0: 1 CE memory read error on CPU_SrcID#0_Ha#0_Chan#1_DIMM#1 (channel:1 slot:1 page:0x7bb011 offset:0x440 grain:32 syndrome:0x0 - area:DRAM err_code:0001:0091 socket:0 ha:0 channel_mask:2 rank:4) Aug 15 03:20:03 elastic1029 kernel: [6193865.567608] EDAC sbridge MC0: HANDLING MCE MEMORY ERROR Aug 15 03:20:03 elastic1029 kernel: [6193865.567609] EDAC sbridge MC0: CPU 0: Machine Check Event: 0 Bank 10: 8800004500800091 Aug 15 03:20:03 elastic1029 kernel: [6193865.567610] EDAC sbridge MC0: TSC 0 Aug 15 03:20:03 elastic1029 kernel: [6193865.567610] EDAC sbridge MC0: ADDR 0 Aug 15 03:20:03 elastic1029 kernel: [6193865.567611] EDAC sbridge MC0: MISC 5221004000400a8c Aug 15 03:20:03 elastic1029 kernel: [6193865.567612] EDAC sbridge MC0: PROCESSOR 0:306e4 TIME 1534303203 SOCKET 0 APIC 0 Aug 15 06:25:04 elastic1029 kernel: [6204966.963167] Process accounting resumed