db1097 crashed due to memory errors and rebooted itself:
properties CreationTimestamp = 20200630051510.000000-300 ElementName = System Event Log Entry RecordData = Multi-bit memory errors detected on a memory device at location(s) DIMM_A1. RecordFormat = string Description RecordID = 15 CreationTimestamp = 20200630051510.000000-300 ElementName = System Event Log Entry RecordData = Multi-bit memory errors detected on a memory device at location(s) DIMM_A3. RecordFormat = string Description RecordID = 13 CreationTimestamp = 20200630051510.000000-300 ElementName = System Event Log Entry RecordData = Multi-bit memory errors detected on a memory device at location(s) DIMM_B1. RecordFormat = string Description RecordID = 12
Times in UTC
[06:16:29] <+icinga-wm> PROBLEM - Host db1097 is DOWN: PING CRITICAL - Packet loss = 100% [06:23:53] <+icinga-wm> RECOVERY - Host db1097 is UP: PING WARNING - Packet loss = 50%, RTA = 0.25 ms
Multiple errors on its memory. This host will be replaced next FY, so maybe not worth buying anything for it. We can just replace it with db1080.
This required etherpad reload.