Page MenuHomePhabricator

Replace RAID controller battery in an-worker1092
Closed, ResolvedPublic

Description

We have experienced a RAID controller battery failure in an-worker1092.

This is one of a batch of servers that have been particularly prone to this issue.

I think we should have some spares in stock, so it would be great if it could be replaced please. I can shut down the server whenever it's convenient.

Event Timeline

BTullis added a project: Data-Platform-SRE.
BTullis moved this task from Incoming to Blocked / Waiting on the Data-Platform-SRE board.
BTullis updated the task description. (Show Details)

@BTullis would like to take care of tomorrow when would be a good time with you to do this?

Icinga downtime and Alertmanager silence (ID=36858c2c-bae0-4a63-9ac9-19916c27613e) set by btullis@cumin1001 for 1 day, 0:00:00 on 1 host(s) and their services with reason: Replacing RAID controller battery

an-worker1092.eqiad.wmnet

Hi @Jclark-ctr - Many thanks. I've shut down the machine ready for you, so you can replace it whenever is convenient. Feel free to boot the host again when finished, as it should rejoin the Hadoop cluster without any further input from us.

@BTullis Replaced Battery host is booting now. Thanks for your assistance