Page MenuHomePhabricator

Replace RAID controller battery on an-worker1095
Closed, ResolvedPublic

Description

We have experienced a RAID controller battery failure in an-worker1095.

This is one of a batch of servers that have been particularly prone to this issue.

I think we should have some spares in stock, so it would be great if it could be replaced please. I can shut down the server whenever it's convenient.

Event Timeline

BTullis triaged this task as Medium priority.
BTullis added a project: ops-eqiad.
BTullis moved this task from Incoming to Blocked / Waiting on the Data-Platform-SRE board.
BTullis added a subscriber: Jclark-ctr.

Hi @Jclark-ctr - We've had another RAID controller fail from the same batch of servers again.
Would you be able to replace it please, when convenient. I can downtime it and shut it down ahead of time for you. Thanks.

@BTullis would you be able to shutdown server for tomorrow morning 8:30am est

Icinga downtime and Alertmanager silence (ID=6f84de2d-a493-4b54-92d4-cefed7da6f97) set by btullis@cumin1001 for 7 days, 0:00:00 on 1 host(s) and their services with reason: Replacing RAID controller battery

an-worker1095.eqiad.wmnet

@Jclark-ctr - I've shut down the machine and downtimed it. Feel free to boot it again normally after changing the battery. Many thanks.

@BTullis replaced failed battery. server is booting up now