Page MenuHomePhabricator

Degraded BBU on db1094 (was: Degraded RAID on db1094)
Closed, ResolvedPublic

Description

TASK AUTO-GENERATED by Nagios/Icinga RAID event handler

A degraded RAID (hpssacli) was detected on host db1094. An automatic snapshot of the current RAID status is attached below.

Please sync with the service owner to find the appropriate time window before actually replacing any failed hardware.

$ sudo hpssacli controller slot=1 show detail
...
   Battery/Capacitor Count: 0
...

Details

Related Gerrit Patches:
operations/mediawiki-config : masterdb-eqiad.php: Depool db1094

Related Objects

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMay 29 2017, 5:27 PM
Volans updated the task description. (Show Details)May 29 2017, 5:45 PM
Volans added a project: DBA.
Marostegui renamed this task from Degraded RAID on db1094 to Degraded BBU on db1094 (was: Degraded RAID on db1094).May 30 2017, 5:55 AM
Marostegui added subscribers: Cmjohnson, Marostegui.

@Cmjohnson once you get the replacement BBU from HP, let us know as we need to depool this host before shutting it down.
Thanks!

Marostegui moved this task from Triage to In progress on the DBA board.May 30 2017, 5:57 AM

a support case has been opened with HPE

Your case was successfully submitted. Please note your Case ID: 5320105305 for future reference.

Did the BBU arrive in the end? I would replace this on Monday if so, not on a Friday, just in case it causes some sort of other trouble.

@Marostegui The battery is here...let me know when you want to replace

@Cmjohnson I will depool the server now and ping you once it is down.

Change 357375 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-eqiad.php: Depool db1094

https://gerrit.wikimedia.org/r/357375

Change 357375 merged by jenkins-bot:
[operations/mediawiki-config@master] db-eqiad.php: Depool db1094

https://gerrit.wikimedia.org/r/357375

Mentioned in SAL (#wikimedia-operations) [2017-06-06T12:58:05Z] <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Depool db1094 for maintenance - T166518 (duration: 00m 39s)

Mentioned in SAL (#wikimedia-operations) [2017-06-06T12:58:23Z] <marostegui> Shutdown db1094 for maintenance - T166518

Marostegui closed this task as Resolved.Jun 6 2017, 1:21 PM
Marostegui assigned this task to Cmjohnson.

All good now - thanks Chris!

Cache Backup Power Source: Batteries
Battery/Capacitor Count: 1
Battery/Capacitor Status: OK