Page MenuHomePhabricator

labvirt1005 - HP RAID controller issue (battery?)
Closed, ResolvedPublic

Description

on labvirt1005, HP RAID is reported as CRIT in Icinga

CRITICAL: Slot 0: Failed: 1I:1:1 - OK: 1I:1:2, 1I:1:3, 1I:1:4, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:9, 1I:1:10, 1I:1:11, 1I:1:12, 1I:1:13, 2I:1:14, 2I:1:15, 2I:1:16, 2I:1:17, 2I:1:18, Controller, Battery/Capacitor

https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=labvirt1005&service=HP+RAID

Event Timeline

Restricted Application added subscribers: Southparkfan, Aklapper. · View Herald Transcript

btw, when searching phab i saw a couple older resolved tickets, like "doesnt boot up" T100030 and "memory errors" T97521 all on this same box, seems like this is kind of a lemon...

Looks like a broken disk to me on the / partition (/dev/sda) From /usr/local/lib/nagios/plugins/get-raid-status-hpssacli

array A

   Logical Drive: 1
      Size: 136.7 GB
      Fault Tolerance: 1
         physicaldrive 1I:1:1 (port 1I:box 1:bay 1, SAS, 146 GB, Failed)
      Mirror Group 2:
         physicaldrive 1I:1:2 (port 1I:box 1:bay 2, SAS, 146 GB, OK)
      Drive Type: Data
      LD Acceleration Method: Controller Cache

Appears to be the disk not the bbu....opened a case with HP
Case ID: 5314232580
Case title:
Failed Hard Drive
Severity 3-Normal
Product serial number: 2M251303BT
Product number: 665554-B21
Submitted: 10/17/2016 12:42:58 PM
Last updated: 10/17/2016 12:42:58 PM
Source: Web
Case status: Received by HP

weirdly, this just recovered (by itself?)

09:53 < icinga-wm> RECOVERY - HP RAID on labvirt1005 is OK: OK: Slot 0: OK: 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:9, 1I:1:10, 1I:1:11, 1I:1:12, 1I:1:13, 2I:1:14, 2I:1:15, 2I:1:16, 2I:1:17, 2I:1:18, Controller, Battery/Capacitor

That doesn't make me feel any better :(

Chris has reseated the disk, that would explain that. But he also ordered a new one.

Oh! Well, that's fine then :)

Swapped disk in slot 0 with new disk

the old disk is being sent back via UPS
1ZA7327E90828184 10

Cmjohnson claimed this task.