mw1041 has hardware issues and has shut down twice today. Inspecting the logs from before the last crash:
```
Nov 19 13:38:13 mw1041 kernel: [597479.852774] CMCI storm detected: switching to poll mode
Nov 19 13:38:43 mw1041 kernel: [597509.855137] CMCI storm subsided: switching to interrupt mode
Nov 19 13:39:15 mw1041 kernel: [597541.540792] mce_notify_irq: 20 callbacks suppressed
Nov 19 13:39:15 mw1041 kernel: [597541.540797] mce: [Hardware Error]: Machine check events logged
Nov 19 13:39:19 mw1041 kernel: [597546.235803] mce: [Hardware Error]: Machine check events logged
```
which were going on for days.
Looking inside mcelog we see:
```
TIME 1448029050 Fri Nov 20 14:17:30 2015
MCG status:
MCi status:
Error overflow
Corrected error
Error enabled
MCi_ADDR register valid
MCA: Instruction CACHE Level-0 Instruction-Fetch Error
STATUS d400010000040150 MCGSTATUS 0
MCGCAP 1c09 APICID 10 SOCKETID 0
CPUID Vendor Intel Family 6 Model 44
Hardware event. This is not a software error.
MCE 0
CPU 7 BANK 2
ADDR 304c5b0
```
always for Cpu 7, bank 2. It makes me guess we have a damaged RAM.
As suggested, this server is way out of warranty and we might consider decommissioning.