cp2029 went offline at ~3:30 UTC on Dec. 24th, about 40 minutes later I powercycled it via mgmt and once it came back up I depooled it.
Looking in kernel log there's:
Dec 24 02:57:30 cp2029 kernel: [27361439.958255] Disabling lock debugging due to kernel taint Dec 24 02:57:30 cp2029 kernel: [27361439.958450] mce: Uncorrected hardware memory error in user-access at 25960fb0c0 Dec 24 02:57:30 cp2029 kernel: [27361439.958472] mce: [Hardware Error]: Machine check events logged Dec 24 02:57:31 cp2029 kernel: [27361440.005909] Memory failure: 0x25960fb: Killing purged:19885 due to hardware memory corruption Dec 24 02:57:31 cp2029 kernel: [27361440.014704] Memory failure: 0x25960fb: recovery action for dirty LRU page: Recovered
Tagging DC-ops since it seems to be a hardware error. I've left the host depooled.