Page MenuHomePhabricator

Degraded RAID on db2050
Closed, ResolvedPublic

Description

TASK AUTO-GENERATED by Nagios/Icinga RAID event handler

A degraded RAID (hpssacli) was detected on host db2050. An automatic snapshot of the current RAID status is attached below.

Please sync with the service owner to find the appropriate time window before actually replacing any failed hardware.

CRITICAL: Slot 0: Predictive Failure: 1I:1:4, 1I:1:7 - Failed: 1I:1:1 - OK: 1I:1:2, 1I:1:3, 1I:1:5, 1I:1:6, 1I:1:8, 1I:1:9, 1I:1:10, 1I:1:11, 1I:1:12 - Controller: OK - Battery/Capacitor: OK

$ sudo /usr/local/lib/nagios/plugins/get-raid-status-hpssacli

Smart Array P420i in Slot 0 (Embedded)

   array A

      Logical Drive: 1
         Size: 3.3 TB
         Fault Tolerance: 1+0
         Strip Size: 256 KB
         Full Stripe Size: 1536 KB
         Status: Interim Recovery Mode
         Caching:  Enabled
         Disk Name: /dev/sda 
         Mount Points: / 37.3 GB Partition Number 2
         OS Status: LOCKED
         Mirror Group 1:
            physicaldrive 1I:1:1 (port 1I:box 1:bay 1, SAS, 600 GB, Failed)
            physicaldrive 1I:1:2 (port 1I:box 1:bay 2, SAS, 600 GB, OK)
            physicaldrive 1I:1:3 (port 1I:box 1:bay 3, SAS, 600 GB, OK)
            physicaldrive 1I:1:4 (port 1I:box 1:bay 4, SAS, 600 GB, Predictive Failure)
            physicaldrive 1I:1:5 (port 1I:box 1:bay 5, SAS, 600 GB, OK)
            physicaldrive 1I:1:6 (port 1I:box 1:bay 6, SAS, 600 GB, OK)
         Mirror Group 2:
            physicaldrive 1I:1:7 (port 1I:box 1:bay 7, SAS, 600 GB, Predictive Failure)
            physicaldrive 1I:1:8 (port 1I:box 1:bay 8, SAS, 600 GB, OK)
            physicaldrive 1I:1:9 (port 1I:box 1:bay 9, SAS, 600 GB, OK)
            physicaldrive 1I:1:10 (port 1I:box 1:bay 10, SAS, 600 GB, OK)
            physicaldrive 1I:1:11 (port 1I:box 1:bay 11, SAS, 600 GB, OK)
            physicaldrive 1I:1:12 (port 1I:box 1:bay 12, SAS, 600 GB, OK)
         Drive Type: Data
         LD Acceleration Method: Controller Cache

Event Timeline

Restricted Application added subscribers: Banyek, Marostegui, Aklapper. · View Herald TranscriptFeb 21 2019, 3:18 AM
Marostegui assigned this task to Papaul.EditedFeb 21 2019, 6:02 AM
Marostegui triaged this task as Normal priority.
Marostegui added a project: DBA.
Marostegui added a subscriber: Papaul.

Let's get the disk changed @Papaul - thanks!
Let's replace only the one that has FAILED, not the ones with predictive failure, those are being tracked at T208323: Predictive failures on disk S.M.A.R.T. status

Marostegui moved this task from Triage to In progress on the DBA board.Feb 21 2019, 6:02 AM
Papaul reassigned this task from Papaul to Marostegui.Feb 21 2019, 3:27 PM

disk replaced

Thanks!

logicaldrive 1 (3.3 TB, RAID 1+0, Recovering, 2% complete)

physicaldrive 1I:1:1 (port 1I:box 1:bay 1, SAS, 600 GB, Rebuilding)
Marostegui closed this task as Resolved.Feb 22 2019, 6:03 AM

All good now, thank you!

logicaldrive 1 (3.3 TB, RAID 1+0, OK)

physicaldrive 1I:1:1 (port 1I:box 1:bay 1, SAS, 600 GB, OK)