Page MenuHomePhabricator

Degraded RAID on an-presto1013
Closed, ResolvedPublic

Description

TASK AUTO-GENERATED by Nagios/Icinga RAID event handler

A degraded RAID (broadcom) was detected on host an-presto1013. An automatic snapshot of the current RAID status is attached below.

Please sync with the service owner to find the appropriate time window before actually replacing any failed hardware.

communication: 0 OK : controller: 1 Needs Attention : physical_disk: 1 Failed : virtual_disk: 1 Dgrd : bbu: 0 OK : enclosure: 0 OK : CLI Version = 007.1910.0000.0000 Oct 08, 2021

$ sudo /usr/local/lib/nagios/plugins/get-raid-status-broadcom
Failed to execute '['/usr/lib/nagios/plugins/check_nrpe', '-4', '-H', 'an-presto1013', '-c', 'get_raid_status_broadcom']': RETCODE: 2
STDOUT:
communication: 0 OK ; controller: 1 Needs Attention ; physical_disk: 1 Failed ; virtual_disk: 1 Dgrd ; bbu: 0 OK ; enclosure: 0 OK ; CLI Version = 007.1910.0000.0000 Oct 08, 2021
Operating system = Linux 5.10.0-36-amd64
Controller = 0
Status = Success
Description = Show Drive Group Succeeded


TOPOLOGY :
========

-------------------------------------------------------------------------------
DG Arr Row EID:Slot DID Type   State  BT       Size PDC  PI SED DS3  FSpace TR 
-------------------------------------------------------------------------------
 0 -   -   -        -   RAID10 Dgrd   N   21.829 TB dflt N  N   dflt N      N  
 0 0   -   -        -   RAID1  Dgrd   N   21.829 TB dflt N  N   dflt N      N  
 0 0   0   64:0     5   DRIVE  Onln   Y    3.637 TB dflt N  N   dflt -      N  
 0 0   1   64:1     0   DRIVE  Onln   N    3.637 TB dflt N  N   dflt -      N  
 0 0   2   64:2     1   DRIVE  Onln   N    3.637 TB dflt N  N   dflt -      N  
 0 0   3   64:3     3   DRIVE  Onln   N    3.637 TB dflt N  N   dflt -      N  
 0 0   4   64:4     9   DRIVE  Failed N    3.637 TB dflt N  N   dflt -      N  
 0 0   5   64:5     8   DRIVE  Onln   N    3.637 TB dflt N  N   dflt -      N  
 0 0   6   64:6     4   DRIVE  Onln   N    3.637 TB dflt N  N   dflt -      N  
 0 0   7   64:7     10  DRIVE  Onln   N    3.637 TB dflt N  N   dflt -      N  
 0 0   8   64:8     2   DRIVE  Onln   N    3.637 TB dflt N  N   dflt -      N  
 0 0   9   64:9     6   DRIVE  Onln   N    3.637 TB dflt N  N   dflt -      N  
 0 0   10  64:10    7   DRIVE  Onln   N    3.637 TB dflt N  N   dflt -      N  
 0 0   11  64:11    11  DRIVE  Onln   N    3.637 TB dflt N  N   dflt -      N  
 1 -   -   -        -   RAID1  Optl   N  446.625 GB dflt N  N   dflt N      N  
 1 0   -   -        -   RAID1  Optl   N  446.625 GB dflt N  N   dflt N      N  
 1 0   0   64:12    13  DRIVE  Onln   N  446.625 GB dflt N  N   dflt -      N  
 1 0   1   64:13    12  DRIVE  Onln   N  446.625 GB dflt N  N   dflt -      N  
-------------------------------------------------------------------------------

DG=Disk Group Index|Arr=Array Index|Row=Row Index|EID=Enclosure Device ID
DID=Device ID|Type=Drive Type|Onln=Online|Rbld=Rebuild|Optl=Optimal|Dgrd=Degraded
Pdgd=Partially degraded|Offln=Offline|BT=Background Task Active
PDC=PD Cache|PI=Protection Info|SED=Self Encrypting Drive|Frgn=Foreign
DS3=Dimmer Switch 3|dflt=Default|Msng=Missing|FSpace=Free Space Present
TR=Transport Ready






STDERR:
None

Related Objects

Event Timeline

Jclark-ctr subscribed.

This server is currently out of warranty we do have spare 4TB drives on hand we can install please advise when we can replace.

updated idrac firmware while logged in to 7.00.00.182 from 5.10.10.00

Thanks @Jclark-ctr - Just to let you know, you can hot-swap this drive at any time.
It doesn't need any action from us, since it is a hardware RAID10 volume, the rebuild should be automatic.

In fact, there is probably no data on the logical volume, anyway.

Thanks will take care of in a few hours

@BTullis Sorry, I typed out a response earlier but forgot to post it. Unfortunately, we do not have any 4TB SAS drives at EQIAD, even after checking all the spares and decom servers. @wiki_willy @RobH We’ll need to order a replacement 4TB SAS drive.

RobH added a subtask: Unknown Object (Task).Oct 28 2025, 2:47 PM

Replacement drive is being ordered from dell on ticket T408572 after reviewing Available options other suppliers only listed 4tb sas with 1/2 speed and 1/2 cache

Eta for delivery Arriving On Nov 7, 2025

replaced failed drive bay 4. idrac also now has allert for A predictive failure detected on drive 0 in disk drive bay 1.

replaced failed drive bay 4. idrac also now has allert for A predictive failure detected on drive 0 in disk drive bay 1.

That's one of the 446 GB SSDs, isn't it? If so, you can feel free to swap this drive out at any time, too.
That would be one of the two O/S drives that are in a hardware RAID1 configuration, so the array should rebuild automatically.

But if we don't have one, then I'm also let it happy to run until the device fails. Whatever you think best. Thanks @Jclark-ctr .

@BTullis Unfortunately, that is a 4TB drive, and we would need to order a replacement. Please let me know if you’d like to wait until it fails or if you’d prefer that I order a replacement now.

Jclark-ctr closed subtask Unknown Object (Task) as Resolved.Nov 12 2025, 1:29 PM

@BTullis Unfortunately, that is a 4TB drive, and we would need to order a replacement. Please let me know if you’d like to wait until it fails or if you’d prefer that I order a replacement now.

I'm happy to wait for it to fail. Thanks @Jclark-ctr for the heads-up.