Page MenuHomePhabricator

Degraded RAID on db2050
Closed, ResolvedPublic

Description

TASK AUTO-GENERATED by Nagios/Icinga RAID event handler

A degraded RAID (hpssacli) was detected on host db2050. An automatic snapshot of the current RAID status is attached below.

Please sync with the service owner to find the appropriate time window before actually replacing any failed hardware.

CRITICAL: Slot 0: Predictive Failure: 1I:1:4, 1I:1:7 - Failed: 1I:1:1 - OK: 1I:1:2, 1I:1:3, 1I:1:5, 1I:1:6, 1I:1:8, 1I:1:9, 1I:1:10, 1I:1:11, 1I:1:12 - Controller: OK - Battery/Capacitor: OK

$ sudo /usr/local/lib/nagios/plugins/get-raid-status-hpssacli

Smart Array P420i in Slot 0 (Embedded)

   array A

      Logical Drive: 1
         Size: 3.3 TB
         Fault Tolerance: 1+0
         Strip Size: 256 KB
         Full Stripe Size: 1536 KB
         Status: Interim Recovery Mode
         Caching:  Enabled
         Disk Name: /dev/sda 
         Mount Points: / 37.3 GB Partition Number 2
         OS Status: LOCKED
         Mirror Group 1:
            physicaldrive 1I:1:1 (port 1I:box 1:bay 1, SAS, 600 GB, Failed)
            physicaldrive 1I:1:2 (port 1I:box 1:bay 2, SAS, 600 GB, OK)
            physicaldrive 1I:1:3 (port 1I:box 1:bay 3, SAS, 600 GB, OK)
            physicaldrive 1I:1:4 (port 1I:box 1:bay 4, SAS, 600 GB, Predictive Failure)
            physicaldrive 1I:1:5 (port 1I:box 1:bay 5, SAS, 600 GB, OK)
            physicaldrive 1I:1:6 (port 1I:box 1:bay 6, SAS, 600 GB, OK)
         Mirror Group 2:
            physicaldrive 1I:1:7 (port 1I:box 1:bay 7, SAS, 600 GB, Predictive Failure)
            physicaldrive 1I:1:8 (port 1I:box 1:bay 8, SAS, 600 GB, OK)
            physicaldrive 1I:1:9 (port 1I:box 1:bay 9, SAS, 600 GB, OK)
            physicaldrive 1I:1:10 (port 1I:box 1:bay 10, SAS, 600 GB, OK)
            physicaldrive 1I:1:11 (port 1I:box 1:bay 11, SAS, 600 GB, OK)
            physicaldrive 1I:1:12 (port 1I:box 1:bay 12, SAS, 600 GB, OK)
         Drive Type: Data
         LD Acceleration Method: Controller Cache

Event Timeline

Marostegui triaged this task as Medium priority.
Marostegui added a project: DBA.
Marostegui added a subscriber: Papaul.

Let's get the disk changed @Papaul - thanks!
Let's replace only the one that has FAILED, not the ones with predictive failure, those are being tracked at T208323: Predictive failures on disk S.M.A.R.T. status

Thanks!

logicaldrive 1 (3.3 TB, RAID 1+0, Recovering, 2% complete)

physicaldrive 1I:1:1 (port 1I:box 1:bay 1, SAS, 600 GB, Rebuilding)

All good now, thank you!

logicaldrive 1 (3.3 TB, RAID 1+0, OK)

physicaldrive 1I:1:1 (port 1I:box 1:bay 1, SAS, 600 GB, OK)