db2079 has a bad DIMM (B8)
Thu Apr 22 2021 10:45:58 Correctable memory error rate exceeded for DIMM_B8.
Thu Apr 22 2021 10:22:56 Correctable memory error rate exceeded for DIMM_B8.
I will like to swap B8 with A8 for testing before replacing.
This is s8 master, so it needs some coordination. Let me know a day/time when you'd like to tackle this and I can have the host ready for you!
@Marostegui hello you can go ahead and depool the server i will be on site in about an hour.
Excellent, thanks @Papaul
Mentioned in SAL (#wikimedia-operations) [2021-06-01T13:56:32Z] <marostegui> Stop mysql on db2079 (codfw master) - T283743
db2079 is off and ready for you @Papaul
Swapped DIMM B8 with DIMM A8 we will see if we do see the issue on DIMM A8 . If we do, I will use one of the DIMM from one if the Decom servers .
Resolving this task for now
On boot, we are hitting T216240, @Papaul let's get firmware and bios upgraded please
Firmware upgrade complete
MySQL started - thanks Papaul!