Page MenuHomePhabricator

codfw: db2079 memory issue on DIMM B8
Closed, ResolvedPublic


db2079 has a bad DIMM (B8)

 	Thu Apr 22 2021 10:45:58	Correctable memory error rate exceeded for DIMM_B8.	
	Thu Apr 22 2021 10:22:56	Correctable memory error rate exceeded for DIMM_B8.

I will like to swap B8 with A8 for testing before replacing.


Event Timeline

Papaul triaged this task as Medium priority.May 26 2021, 4:48 PM

This is s8 master, so it needs some coordination. Let me know a day/time when you'd like to tackle this and I can have the host ready for you!

@Marostegui hello you can go ahead and depool the server i will be on site in about an hour.


Mentioned in SAL (#wikimedia-operations) [2021-06-01T13:56:32Z] <marostegui> Stop mysql on db2079 (codfw master) - T283743

Swapped DIMM B8 with DIMM A8 we will see if we do see the issue on DIMM A8 . If we do, I will use one of the DIMM from one if the Decom servers .

Resolving this task for now

On boot, we are hitting T216240, @Papaul let's get firmware and bios upgraded please

Firmware upgrade complete