Page MenuHomePhabricator

Degraded RAID on db2044
Closed, ResolvedPublic

Description

TASK AUTO-GENERATED by Nagios/Icinga RAID event handler

A degraded RAID (hpssacli) was detected on host db2044. An automatic snapshot of the current RAID status is attached below.

Please sync with the service owner to find the appropriate time window before actually replacing any failed hardware.

CRITICAL: Slot 0: OK: 1I:1:1, 1I:1:10, 1I:1:11, 1I:1:12, 1I:1:3, 1I:1:4, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:9 - Failed: 1I:1:2 - Controller: OK - Battery/Capacitor: OK

$ sudo /usr/local/lib/nagios/plugins/get-raid-status-hpssacli

Smart Array P420i in Slot 0 (Embedded)

   array A

      Logical Drive: 1
         Size: 3.3 TB
         Fault Tolerance: 1+0
         Strip Size: 256 KB
         Full Stripe Size: 1536 KB
         Status: Interim Recovery Mode
         Caching:  Enabled
         Disk Name: /dev/sda 
         Mount Points: / 37.3 GB Partition Number 2
         OS Status: LOCKED
         Mirror Group 1:
            physicaldrive 1I:1:1 (port 1I:box 1:bay 1, SAS, 600 GB, OK)
            physicaldrive 1I:1:2 (port 1I:box 1:bay 2, SAS, 600 GB, Failed)
            physicaldrive 1I:1:3 (port 1I:box 1:bay 3, SAS, 600 GB, OK)
            physicaldrive 1I:1:4 (port 1I:box 1:bay 4, SAS, 600 GB, OK)
            physicaldrive 1I:1:5 (port 1I:box 1:bay 5, SAS, 600 GB, OK)
            physicaldrive 1I:1:6 (port 1I:box 1:bay 6, SAS, 600 GB, OK)
         Mirror Group 2:
            physicaldrive 1I:1:7 (port 1I:box 1:bay 7, SAS, 600 GB, OK)
            physicaldrive 1I:1:8 (port 1I:box 1:bay 8, SAS, 600 GB, OK)
            physicaldrive 1I:1:9 (port 1I:box 1:bay 9, SAS, 600 GB, OK)
            physicaldrive 1I:1:10 (port 1I:box 1:bay 10, SAS, 600 GB, OK)
            physicaldrive 1I:1:11 (port 1I:box 1:bay 11, SAS, 600 GB, OK)
            physicaldrive 1I:1:12 (port 1I:box 1:bay 12, SAS, 600 GB, OK)
         Drive Type: Data
         LD Acceleration Method: Controller Cache

Related Objects

Event Timeline

Restricted Application added subscribers: Marostegui, Aklapper. · View Herald TranscriptJul 11 2019, 9:40 PM
Papaul claimed this task.Jul 12 2019, 12:38 AM
Papaul triaged this task as Normal priority.
Papaul moved this task from Backlog to Hardware Failure / Troubleshoot on the ops-codfw board.

See also recent T217755

@Papaul We will ask you to replace a disk here from T226406, when they arrive.

jcrespo added a subtask: Unknown Object (Task).Jul 12 2019, 7:11 AM
Papaul closed subtask Unknown Object (Task) as Resolved.Jul 17 2019, 5:35 AM

@Marostegui Double checking, should we replace this or is it being decommed now?

Let's replace with an USED one for now, that host will go away "soonish"

There is no spare USED disks.

We should have a bunch of disks from the decommissioned hosts, no?

Papaul reassigned this task from Papaul to Marostegui.Jul 17 2019, 2:53 PM
Papaul added a subscriber: Papaul.

Replaced with a used one.

Thanks - I can see it rebuilding:

physicaldrive 1I:1:2 (port 1I:box 1:bay 2, SAS, 600 GB, Rebuilding)
Marostegui closed this task as Resolved.Jul 18 2019, 5:00 AM

All good - thanks @Papaul!

root@db2044:~# hpssacli controller all show config

Smart Array P420i in Slot 0 (Embedded)    (sn: 0014380264FFFB0)


   Port Name: 1I

   Port Name: 2I

   Gen8 ServBP 12+2 at Port 1I, Box 1, OK
   array A (SAS, Unused Space: 0  MB)


      logicaldrive 1 (3.3 TB, RAID 1+0, OK)

      physicaldrive 1I:1:1 (port 1I:box 1:bay 1, SAS, 600 GB, OK)
      physicaldrive 1I:1:2 (port 1I:box 1:bay 2, SAS, 600 GB, OK)
      physicaldrive 1I:1:3 (port 1I:box 1:bay 3, SAS, 600 GB, OK)
      physicaldrive 1I:1:4 (port 1I:box 1:bay 4, SAS, 600 GB, OK)
      physicaldrive 1I:1:5 (port 1I:box 1:bay 5, SAS, 600 GB, OK)
      physicaldrive 1I:1:6 (port 1I:box 1:bay 6, SAS, 600 GB, OK)
      physicaldrive 1I:1:7 (port 1I:box 1:bay 7, SAS, 600 GB, OK)
      physicaldrive 1I:1:8 (port 1I:box 1:bay 8, SAS, 600 GB, OK)
      physicaldrive 1I:1:9 (port 1I:box 1:bay 9, SAS, 600 GB, OK)
      physicaldrive 1I:1:10 (port 1I:box 1:bay 10, SAS, 600 GB, OK)
      physicaldrive 1I:1:11 (port 1I:box 1:bay 11, SAS, 600 GB, OK)
      physicaldrive 1I:1:12 (port 1I:box 1:bay 12, SAS, 600 GB, OK)