During the last 9 days three caching nodes went down with the same symptoms:
- Nothing on the SEL
- KVM unresponsive
- Network down
- Nothing on the logs
A power cycle fixed them.
So far the affected systems are PowerEdge R440:
- cp3053 - T239041
- cp1077 - T238289
- cp3057 - T237348 T239502 T244127
- cp3065 - T238032 and 2020-01-05
- db2125 - T239042 Kernel at the time of the crash: Linux db2125 4.9.0-11-amd64 #1 SMP Debian 4.9.189-3+deb9u1 (2019-09-20) x86_64 GNU/Linux
- cp3063 - T239310
- cp1087 - T239449
- cp3055 - T240425 (twice, same task, I think the firmware has not yet been updated)
- backup2001 - T240177 T237730 T240177#5773711 (crashed 3 times, the second crash happened with the firmware running the latest version)
- cp3051 - T241306
- cp3061 - crashed 2019-12-28T23:36
- cp1087 - crashed 2020-04-16T12:30
Maybe a kernel upgrade or a CPU microcode update is messing with them?