Page MenuHomePhabricator

Memory initialization error on scb1003
Closed, ResolvedPublic

Description

scb1003 doesn't boot up without manual intervention any more, during startup a memory inititalization error is shown, which needs to be acknowledged by pressing F1. Can you please run a memory check to identify the broken DIMM module? The server is OOW since 11 months, but maybe we have a spare module somewhere?

The host is depooled and can be taken down for hardware checks

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptFeb 27 2018, 10:57 AM

Mentioned in SAL (#wikimedia-operations) [2018-02-27T10:57:50Z] <moritzm> keeping scb1003 depooled for T188385

Cmjohnson moved this task from Backlog to Up next on the ops-eqiad board.Mar 1 2018, 7:06 PM

@MoritzMuehlenhoff The DIMM at A2 was bad. I tested it by swapping to B2 and it the failure moved to B2. Fortunately we have a few R420 with similar DIMM decommissioned already and I was able to replace it. Please verify all is well and resolve this task

Cmjohnson moved this task from Up next to Blocked on the ops-eqiad board.Mar 1 2018, 7:21 PM
mobrovac added a subscriber: mobrovac.

Please add the SCB tag in the future to tasks pertaining to SCB hosts so that we are aware of any upcoming changes.

Mentioned in SAL (#wikimedia-operations) [2018-03-02T08:51:14Z] <moritzm> repooling scb1003 after memory module was replaced (T188385)

MoritzMuehlenhoff closed this task as Resolved.Mar 2 2018, 8:51 AM

@MoritzMuehlenhoff The DIMM at A2 was bad. I tested it by swapping to B2 and it the failure moved to B2. Fortunately we have a few R420 with similar DIMM decommissioned already and I was able to replace it. Please verify all is well and resolve this task

Looks all fine, I've repooled the server.