Hung a few minutes ago, unresponsive on mgmt console, no ping, had to reboot. Found these in the kernel log:
Nov 22 18:30:38 mw2251 kernel: [53617.891710] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 4 Nov 22 18:30:38 mw2251 kernel: [53617.891714] {1}[Hardware Error]: It has been corrected by h/w and requires no further action Nov 22 18:30:38 mw2251 kernel: [53617.891716] {1}[Hardware Error]: event severity: corrected Nov 22 18:30:38 mw2251 kernel: [53617.891719] {1}[Hardware Error]: Error 0, type: corrected Nov 22 18:30:38 mw2251 kernel: [53617.891720] {1}[Hardware Error]: fru_text: A1 Nov 22 18:30:38 mw2251 kernel: [53617.891722] {1}[Hardware Error]: section_type: memory error Nov 22 18:30:38 mw2251 kernel: [53617.891725] {1}[Hardware Error]: error_status: 0x0000000000000400 Nov 22 18:30:38 mw2251 kernel: [53617.891727] {1}[Hardware Error]: physical_address: 0x00000007a3602400 Nov 22 18:30:38 mw2251 kernel: [53617.891732] {1}[Hardware Error]: node: 0 card: 0 module: 0 rank: 1 bank: 2 row: 49592 column: 64 Nov 22 18:30:38 mw2251 kernel: [53617.891734] {1}[Hardware Error]: error_type: 2, single-bit ECC Nov 22 18:30:38 mw2251 kernel: [53617.891756] EDAC sbridge MC0: HANDLING MCE MEMORY ERROR Nov 22 18:30:38 mw2251 kernel: [53617.891760] EDAC sbridge MC0: CPU 0: Machine Check Event: 0 Bank 255: 940000000000009f Nov 22 18:30:38 mw2251 kernel: [53617.891762] EDAC sbridge MC0: TSC 6b545750aad2 Nov 22 18:30:38 mw2251 kernel: [53617.891764] EDAC sbridge MC0: ADDR 7a3602400 Nov 22 18:30:38 mw2251 kernel: [53617.891766] EDAC sbridge MC0: MISC 0 Nov 22 18:30:38 mw2251 kernel: [53617.891769] EDAC sbridge MC0: PROCESSOR 0:406f1 TIME 1511375438 SOCKET 0 APIC 0 Nov 22 18:30:38 mw2251 kernel: [53617.891798] EDAC MC0: 0 CE memory read error on CPU_SrcID#0_Ha#0_Chan#0_DIMM#0 (channel:0 slot:0 page:0x7a3602 offset:0x400 grain:32 syndrome:0x0 - area:DRAM e rr_code:0000:009f socket:0 ha:0 channel_mask:1 rank:1) Nov 22 18:30:40 mw2251 kernel: [53618.219309] INFO: NMI handler (perf_event_nmi_handler) took too long to run: 135.996 msecs Nov 22 18:30:41 mw2251 kernel: [53618.219314] perf: interrupt took too long (356655 > 6203), lowering kernel.perf_event_max_sample_rate to 500 Nov 22 18:30:41 mw2251 kernel: [53618.264509] {2}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 4 Nov 22 18:30:41 mw2251 kernel: [53618.264511] {2}[Hardware Error]: It has been corrected by h/w and requires no further action Nov 22 18:30:41 mw2251 kernel: [53618.264512] {2}[Hardware Error]: event severity: corrected Nov 22 18:30:41 mw2251 kernel: [53618.264513] {2}[Hardware Error]: Error 0, type: corrected Nov 22 18:30:41 mw2251 kernel: [53618.264514] {2}[Hardware Error]: fru_text: A1 Nov 22 18:30:41 mw2251 kernel: [53618.264515] {2}[Hardware Error]: section_type: memory error Nov 22 18:30:41 mw2251 kernel: [53618.264516] {2}[Hardware Error]: error_status: 0x0000000000000400 Nov 22 18:30:41 mw2251 kernel: [53618.264518] {2}[Hardware Error]: physical_address: 0x00000007a3603200 Nov 22 18:30:41 mw2251 kernel: [53618.264521] {2}[Hardware Error]: node: 0 card: 0 module: 0 rank: 1 bank: 2 row: 49592 column: 288 Nov 22 18:30:41 mw2251 kernel: [53618.264522] {2}[Hardware Error]: error_type: 2, single-bit ECC Nov 22 18:30:41 mw2251 kernel: [53618.264529] mce: [Hardware Error]: Machine check events logged Nov 22 18:30:41 mw2251 kernel: [53618.780860] INFO: NMI handler (ghes_notify_nmi) took too long to run: 45.050 msecs Nov 22 18:30:41 mw2251 kernel: [53619.286816] INFO: NMI handler (nmi_cpu_backtrace_handler) took too long to run: 45.035 msecs Nov 22 18:30:41 mw2251 kernel: [53620.289217] INFO: NMI handler (ghes_notify_nmi) took too long to run: 45.069 msecs Nov 22 18:30:41 mw2251 kernel: [53620.334641] INFO: NMI handler (ghes_notify_nmi) took too long to run: 45.401 msecs Nov 22 18:30:41 mw2251 kernel: [53620.834701] INFO: NMI handler (ghes_notify_nmi) took too long to run: 45.407 msecs Nov 22 18:30:43 mw2251 kernel: [53622.101085] sched: RT throttling activated Nov 22 18:30:43 mw2251 kernel: [53622.146134] INFO: NMI handler (ghes_notify_nmi) took too long to run: 1130.085 msecs Nov 22 18:30:43 mw2251 kernel: [53622.787521] EDAC sbridge MC0: HANDLING MCE MEMORY ERROR
and so on.