Page MenuHomePhabricator

Degraded RAID on an-worker1191
Open, Needs TriagePublic

Description

TASK AUTO-GENERATED by Nagios/Icinga RAID event handler

A degraded RAID (broadcom) was detected on host an-worker1191. An automatic snapshot of the current RAID status is attached below.

Please sync with the service owner to find the appropriate time window before actually replacing any failed hardware.

communication: 0 OK : controller: 1 Needs Attention : physical_disk: 2 Failed : virtual_disk: 1 OfLn : bbu: 0 OK : enclosure: 0 OK : CLI Version = 007.1910.0000.0000 Oct 08, 2021

$ sudo /usr/local/lib/nagios/plugins/get-raid-status-broadcom
Failed to execute '['/usr/lib/nagios/plugins/check_nrpe', '-4', '-H', 'an-worker1191', '-c', 'get_raid_status_broadcom']': RETCODE: 2
STDOUT:
communication: 0 OK ; controller: 1 Needs Attention ; physical_disk: 2 Failed ; virtual_disk: 1 OfLn ; bbu: 0 OK ; enclosure: 0 OK ; CLI Version = 007.1910.0000.0000 Oct 08, 2021
Operating system = Linux 5.10.0-34-amd64
Controller = 0
Status = Success
Description = Show Drive Group Succeeded


TOPOLOGY :
========

-----------------------------------------------------------------------------
DG Arr Row EID:Slot DID Type  State BT       Size PDC  PI SED DS3  FSpace TR 
-----------------------------------------------------------------------------
 0 -   -   -        -   RAID1 Optl  N  446.625 GB enbl N  N   dflt N      N  
 0 0   -   -        -   RAID1 Optl  N  446.625 GB enbl N  N   dflt N      N  
 0 0   0   251:0    15  DRIVE Onln  N  446.625 GB enbl N  N   dflt -      N  
 0 0   1   251:1    14  DRIVE Onln  N  446.625 GB enbl N  N   dflt -      N  
 1 -   -   -        -   RAID0 Optl  N    7.276 TB enbl N  N   dflt N      N  
 1 0   -   -        -   RAID0 Optl  N    7.276 TB enbl N  N   dflt N      N  
 1 0   0   252:0    8   DRIVE Onln  N    7.276 TB enbl N  N   dflt -      N  
 2 -   -   -        -   RAID0 OfLn  N    7.276 TB enbl N  N   dflt N      N  
 2 0   -   -        -   RAID0 Dgrd  N    7.276 TB enbl N  N   dflt N      N  
 3 -   -   -        -   RAID0 Optl  N    7.276 TB enbl N  N   dflt N      N  
 3 0   -   -        -   RAID0 Optl  N    7.276 TB enbl N  N   dflt N      N  
 3 0   0   252:2    11  DRIVE Onln  N    7.276 TB enbl N  N   dflt -      N  
 4 -   -   -        -   RAID0 Optl  N    7.276 TB enbl N  N   dflt N      N  
 4 0   -   -        -   RAID0 Optl  N    7.276 TB enbl N  N   dflt N      N  
 4 0   0   252:3    12  DRIVE Onln  N    7.276 TB enbl N  N   dflt -      N  
 5 -   -   -        -   RAID0 Optl  N    7.276 TB enbl N  N   dflt N      N  
 5 0   -   -        -   RAID0 Optl  N    7.276 TB enbl N  N   dflt N      N  
 5 0   0   252:4    13  DRIVE Onln  N    7.276 TB enbl N  N   dflt -      N  
 6 -   -   -        -   RAID0 Optl  N    7.276 TB enbl N  N   dflt N      N  
 6 0   -   -        -   RAID0 Optl  N    7.276 TB enbl N  N   dflt N      N  
 6 0   0   252:5    3   DRIVE Onln  N    7.276 TB enbl N  N   dflt -      N  
 7 -   -   -        -   RAID0 Optl  N    7.276 TB enbl N  N   dflt N      N  
 7 0   -   -        -   RAID0 Optl  N    7.276 TB enbl N  N   dflt N      N  
 7 0   0   252:6    5   DRIVE Onln  N    7.276 TB enbl N  N   dflt -      N  
 8 -   -   -        -   RAID0 Optl  N    7.276 TB enbl N  N   dflt N      N  
 8 0   -   -        -   RAID0 Optl  N    7.276 TB enbl N  N   dflt N      N  
 8 0   0   252:8    2   DRIVE Onln  N    7.276 TB enbl N  N   dflt -      N  
 9 -   -   -        -   RAID0 Optl  N    7.276 TB enbl N  N   dflt N      N  
 9 0   -   -        -   RAID0 Optl  N    7.276 TB enbl N  N   dflt N      N  
 9 0   0   252:9    1   DRIVE Onln  N    7.276 TB enbl N  N   dflt -      N  
10 -   -   -        -   RAID0 Optl  N    7.276 TB enbl N  N   dflt N      N  
10 0   -   -        -   RAID0 Optl  N    7.276 TB enbl N  N   dflt N      N  
10 0   0   252:10   4   DRIVE Onln  N    7.276 TB enbl N  N   dflt -      N  
11 -   -   -        -   RAID0 Optl  N    7.276 TB enbl N  N   dflt N      N  
11 0   -   -        -   RAID0 Optl  N    7.276 TB enbl N  N   dflt N      N  
11 0   0   252:11   6   DRIVE Onln  N    7.276 TB enbl N  N   dflt -      N  
-----------------------------------------------------------------------------

DG=Disk Group Index|Arr=Array Index|Row=Row Index|EID=Enclosure Device ID
DID=Device ID|Type=Drive Type|Onln=Online|Rbld=Rebuild|Optl=Optimal|Dgrd=Degraded
Pdgd=Partially degraded|Offln=Offline|BT=Background Task Active
PDC=PD Cache|PI=Protection Info|SED=Self Encrypting Drive|Frgn=Foreign
DS3=Dimmer Switch 3|dflt=Default|Msng=Missing|FSpace=Free Space Present
TR=Transport Ready






STDERR:
None

Event Timeline

Jclark-ctr subscribed.

service request 219355025

Jclark-ctr added a subscriber: BTullis.

@BTullis parts should arrive Monday. they are shipping 2x drives

OK, thanks. You can go ahead and swap these. I found out which drivers were showing errors, as far as the kernel is concerned:

sudo dmesg -T
<snip>
[Tue Dec  2 18:14:09 2025] EXT4-fs error (device sdc1): __ext4_find_entry:1584: inode #2: comm DiskHealthMonit: reading directory lblock 0
[Tue Dec  2 18:14:09 2025] EXT4-fs error (device sdi1): __ext4_find_entry:1584: inode #2: comm DiskHealthMonit: reading directory lblock 0

I then unmounted these drives cleanly, so you can swap them out.

@BTullis. just a reminder this is ready for you then can be closed