Description

db1097 crashed due to memory errors and rebooted itself:

	properties
		CreationTimestamp = 20200630051510.000000-300
		ElementName = System Event Log Entry
		RecordData = Multi-bit memory errors detected on a memory device at location(s) DIMM_A1.
		RecordFormat = string Description
		RecordID = 15
		CreationTimestamp = 20200630051510.000000-300
		ElementName = System Event Log Entry
		RecordData = Multi-bit memory errors detected on a memory device at location(s) DIMM_A3.
		RecordFormat = string Description
		RecordID = 13
		CreationTimestamp = 20200630051510.000000-300
		ElementName = System Event Log Entry
		RecordData = Multi-bit memory errors detected on a memory device at location(s) DIMM_B1.
		RecordFormat = string Description
		RecordID = 12

Times in UTC

[06:16:29]  <+icinga-wm>	PROBLEM - Host db1097 is DOWN: PING CRITICAL - Packet loss = 100%
[06:23:53]  <+icinga-wm>	RECOVERY - Host db1097 is UP: PING WARNING - Packet loss = 50%, RTA = 0.25 ms

Multiple errors on its memory. This host will be replaced next FY, so maybe not worth buying anything for it. We can just replace it with db1080.
This required etherpad reload.

Subject	Repo	Branch	Lines +/-
db1097: Disable notifications	operations/puppet	production	+1 -0
mariadb: Promote db1080 to m1 master	operations/puppet	production	+7 -6
db1080: Enable notifications	operations/puppet	production	+0 -1
site.pp: Move db1080 to m1 instead of m2	operations/puppet	production	+9 -9

db1097 (m1 master) crashed due to memory issues.
Closed, ResolvedPublic
Actions

Description

Details

Related Objects

Event Timeline

	• Marostegui
	Jun 30 2020, 6:31 AM

db1097 (m1 master) crashed due to memory issues.Closed, ResolvedPublicActions

Description

Details

Related Objects

Event Timeline

db1097 (m1 master) crashed due to memory issues.
Closed, ResolvedPublic
Actions