Page MenuHomePhabricator

Degraded RAID on labstore1007
Closed, ResolvedPublic

Description

TASK AUTO-GENERATED by Nagios/Icinga RAID event handler

A degraded RAID (hpssacli) was detected on host labstore1007. An automatic snapshot of the current RAID status is attached below.

Please sync with the service owner to find the appropriate time window before actually replacing any failed hardware.

CRITICAL: Slot 1: OK: 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 1I:1:6, 1I:1:7, 1I:1:8, 2I:2:1, 2I:2:2, 2I:2:3, 2I:2:4, 2I:4:1, 2I:4:2 - Failed: 1I:1:5 - Controller: OK - Battery/Capacitor: OK --- Slot 3: OK: 1E:1:1, 1E:1:10, 1E:1:11, 1E:1:12, 1E:1:2, 1E:1:3, 1E:1:4, 1E:1:5, 1E:1:6, 1E:1:7, 1E:1:8, 1E:1:9, 1E:2:1, 1E:2:10, 1E:2:11, 1E:2:12, 1E:2:2, 1E:2:3, 1E:2:4, 1E:2:5, 1E:2:6, 1E:2:7, 1E:2:8, 1E:2:9 - Controller: OK - Battery/Capacitor: OK

$ sudo /usr/local/lib/nagios/plugins/get-raid-status-hpssacli

Smart Array P441 in Slot 3

   array A

      Logical Drive: 1
         Size: 32.7 TB
         Fault Tolerance: 1+0
         Strip Size: 256 KB
         Full Stripe Size: 1536 KB
         Status: OK
         MultiDomain Status: OK
         Caching:  Enabled
         Disk Name: /dev/sdc 
         Mount Points: None
         Mirror Group 1:
            physicaldrive 1E:1:1 (port 1E:box 1:bay 1, SATA, 6001.1 GB, OK)
            physicaldrive 1E:1:2 (port 1E:box 1:bay 2, SATA, 6001.1 GB, OK)
            physicaldrive 1E:1:3 (port 1E:box 1:bay 3, SATA, 6001.1 GB, OK)
            physicaldrive 1E:1:4 (port 1E:box 1:bay 4, SATA, 6001.1 GB, OK)
            physicaldrive 1E:1:5 (port 1E:box 1:bay 5, SATA, 6001.1 GB, OK)
            physicaldrive 1E:1:6 (port 1E:box 1:bay 6, SATA, 6001.1 GB, OK)
         Mirror Group 2:
            physicaldrive 1E:1:7 (port 1E:box 1:bay 7, SATA, 6001.1 GB, OK)
            physicaldrive 1E:1:8 (port 1E:box 1:bay 8, SATA, 6001.1 GB, OK)
            physicaldrive 1E:1:9 (port 1E:box 1:bay 9, SATA, 6001.1 GB, OK)
            physicaldrive 1E:1:10 (port 1E:box 1:bay 10, SATA, 6001.1 GB, OK)
            physicaldrive 1E:1:11 (port 1E:box 1:bay 11, SATA, 6001.1 GB, OK)
            physicaldrive 1E:1:12 (port 1E:box 1:bay 12, SATA, 6001.1 GB, OK)
         Drive Type: Data
         LD Acceleration Method: Controller Cache

   array B

      Logical Drive: 2
         Size: 32.7 TB
         Fault Tolerance: 1+0
         Strip Size: 256 KB
         Full Stripe Size: 1536 KB
         Status: OK
         MultiDomain Status: OK
         Caching:  Enabled
         Disk Name: /dev/sdd 
         Mount Points: None
         Mirror Group 1:
            physicaldrive 1E:2:1 (port 1E:box 2:bay 1, SAS, 6001.1 GB, OK)
            physicaldrive 1E:2:2 (port 1E:box 2:bay 2, SAS, 6001.1 GB, OK)
            physicaldrive 1E:2:3 (port 1E:box 2:bay 3, SAS, 6001.1 GB, OK)
            physicaldrive 1E:2:4 (port 1E:box 2:bay 4, SAS, 6001.1 GB, OK)
            physicaldrive 1E:2:5 (port 1E:box 2:bay 5, SAS, 6001.1 GB, OK)
            physicaldrive 1E:2:6 (port 1E:box 2:bay 6, SAS, 6001.1 GB, OK)
         Mirror Group 2:
            physicaldrive 1E:2:7 (port 1E:box 2:bay 7, SAS, 6001.1 GB, OK)
            physicaldrive 1E:2:8 (port 1E:box 2:bay 8, SAS, 6001.1 GB, OK)
            physicaldrive 1E:2:9 (port 1E:box 2:bay 9, SAS, 6001.1 GB, OK)
            physicaldrive 1E:2:10 (port 1E:box 2:bay 10, SAS, 6001.1 GB, OK)
            physicaldrive 1E:2:11 (port 1E:box 2:bay 11, SAS, 6001.1 GB, OK)
            physicaldrive 1E:2:12 (port 1E:box 2:bay 12, SAS, 6001.1 GB, OK)
         Drive Type: Data
         LD Acceleration Method: Controller Cache

Smart Array P840 in Slot 1

   array A

      Logical Drive: 1
         Size: 931.5 GB
         Fault Tolerance: 1
         Strip Size: 256 KB
         Full Stripe Size: 256 KB
         Status: OK
         MultiDomain Status: OK
         Caching:  Enabled
         Disk Name: /dev/sda 
         Mount Points: /boot 953 MB Partition Number 2
         OS Status: LOCKED
         Mirror Group 1:
            physicaldrive 2I:4:1 (port 2I:box 4:bay 1, SATA, 1 TB, OK)
         Mirror Group 2:
            physicaldrive 2I:4:2 (port 2I:box 4:bay 2, SATA, 1 TB, OK)
         Drive Type: Data
         LD Acceleration Method: Controller Cache

   array B

      Logical Drive: 2
         Size: 32.7 TB
         Fault Tolerance: 1+0
         Strip Size: 256 KB
         Full Stripe Size: 1536 KB
         Status: Interim Recovery Mode
         MultiDomain Status: OK
         Caching:  Enabled
         Disk Name: /dev/sdb 
         Mount Points: None
         Mirror Group 1:
            physicaldrive 1I:1:5 (port 1I:box 1:bay 5, SATA, 6001.1 GB, Failed)
            physicaldrive 1I:1:6 (port 1I:box 1:bay 6, SATA, 6001.1 GB, OK)
            physicaldrive 1I:1:7 (port 1I:box 1:bay 7, SATA, 6001.1 GB, OK)
            physicaldrive 1I:1:8 (port 1I:box 1:bay 8, SATA, 6001.1 GB, OK)
            physicaldrive 1I:1:1 (port 1I:box 1:bay 1, SATA, 6001.1 GB, OK)
            physicaldrive 1I:1:2 (port 1I:box 1:bay 2, SATA, 6001.1 GB, OK)
         Mirror Group 2:
            physicaldrive 1I:1:3 (port 1I:box 1:bay 3, SATA, 6001.1 GB, OK)
            physicaldrive 1I:1:4 (port 1I:box 1:bay 4, SATA, 6001.1 GB, OK)
            physicaldrive 2I:2:1 (port 2I:box 2:bay 1, SATA, 6001.1 GB, OK)
            physicaldrive 2I:2:2 (port 2I:box 2:bay 2, SATA, 6001.1 GB, OK)
            physicaldrive 2I:2:3 (port 2I:box 2:bay 3, SATA, 6001.1 GB, OK)
            physicaldrive 2I:2:4 (port 2I:box 2:bay 4, SATA, 6001.1 GB, OK)
         Drive Type: Data
         LD Acceleration Method: Controller Cache

Event Timeline

@wiki_willy @Andrew @Bstorm This server is out of warranty. Do you want to purchase a new disk?

Please see T281045: labstore1007 crashed after storage controller errors--replace disk? that's the disk that I believe we ordered there. It must have finally failed out :)

Cmjohnson claimed this task.

okay, thanks. I close this ticket. I'll be at data center tomorrow and will look for the new disk.