mw2264 went down today and is also unreachable via the serial console, can you please have a look?
It has been set as inactive in conftool so that it doesn't interfere with deployments.
mw2264 went down today and is also unreachable via the serial console, can you please have a look?
It has been set as inactive in conftool so that it doesn't interfere with deployments.
Record: 5 Date/Time: 08/30/2021 10:22:49 Source: system Severity: Non-Critical Description: Correctable memory error rate exceeded for DIMM_B1. ------------------------------------------------------------------------------- Record: 6 Date/Time: 08/30/2021 10:43:31 Source: system Severity: Critical Description: Correctable memory error rate exceeded for DIMM_B1. ------------------------------------------------------------------------------- Record: 7 Date/Time: 09/02/2021 09:25:49 Source: system Severity: Ok Description: A problem was detected in Memory Reference Code (MRC). ------------------------------------------------------------------------------- Record: 8 Date/Time: 09/02/2021 09:25:49 Source: system Severity: Critical Description: Multi-bit memory errors detected on a memory device at location(s) DIMM_B1. ------------------------------------------------------------------------------- Record: 9 Date/Time: 09/02/2021 09:25:49 Source: system Severity: Ok Description: A problem was detected in Memory Reference Code (MRC). ------------------------------------------------------------------------------- Record: 10 Date/Time: 09/02/2021 09:25:49 Source: system Severity: Critical Description: Multi-bit memory errors detected on a memory device at location(s) DIMM_B1. -------------------------------------------------------------------------------
@Dzahn I checked the server today i have no errors showing on A1 closing this task . IF we have the error again please reopen the task.
Thanks
Mentioned in SAL (#wikimedia-operations) [2021-09-07T13:49:41Z] <mutante> mw2264 - scap pulled and repooled after T290242