Page MenuHomePhabricator

cloudvirt1012 - Icinga/HP RAID - 2021-07-16
Closed, DuplicatePublic

Description

Write the description below

alertname: Icinga/HP RAID
summary: WARNING: Slot 0: OK: 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 2I:1:5, 2I:1:6 - Controller: OK - Battery/Capacitor: Recharging
1 day ago
instance: cloudvirt1012
severity: warning
runbook: https://wikitech.wikimedia.org/wiki/SRE/Dc-operations/Hardware_Troubleshooting_Runbook#Hardware_Raid_Information_Gathering

Event Timeline

dcaro triaged this task as High priority.Jul 16 2021, 9:55 AM
dcaro created this task.

Mentioned in SAL (#wikimedia-cloud) [2021-07-16T09:55:35Z] <dcaro> checking HP raid issues on coludvirt1012 (T286766)

So the warning comes from the battery, as the check expects it to say "OK".

The tools report it recharging too:

root@cloudvirt1012:~#  hpssacli controller all show detail | grep -i battery
   No-Battery Write Cache: Disabled
   Battery/Capacitor Count: 1
   Battery/Capacitor Status: Recharging

The server has been up for a kinda long time, so it might be some battery misbehaving:

root@cloudvirt1012:~# uptime
 12:40:00 up 77 days,  2:57,  1 user,  load average: 11.52, 9.24, 8.40
dcaro added a subscriber: RobH.

@RobH can someone take a look? If the server is still in warranty we might want to get a replacement for the battery.
Thanks!

dcaro removed dcaro as the assignee of this task.Jul 16 2021, 12:44 PM
dcaro added a project: DC-Ops.

Silenced the alert for 10 days on alertmanager.