Page MenuHomePhabricator

Degraded RAID on logstash2022
Closed, ResolvedPublic

Description

TASK AUTO-GENERATED by Nagios/Icinga RAID event handler

A degraded RAID (md) was detected on host logstash2022. An automatic snapshot of the current RAID status is attached below.

Please sync with the service owner to find the appropriate time window before actually replacing any failed hardware.

CRITICAL: State: degraded, Active: 7, Working: 7, Failed: 0, Spare: 0

$ sudo /usr/local/lib/nagios/plugins/get-raid-status-md
b'Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] \nmd1 : active raid0 sda3[0] sdd3[3] sdc3[2] sdb3[1]\n      15432220672 blocks super 1.2 512k chunks\n      \nmd0 : active raid1 sda2[0] sdd2[3] sdb2[1]\n      48794624 blocks super 1.2 [4/3] [UU_U]\n      \nunused devices: <none>\n'

Event Timeline

Mhh sdc2 got booted off md0 but stayed in md1, I didn't see any obvious messages/failures about sdc in dmesg so I added the disk back, let's see what happens

root@logstash2022:~# cat /proc/mdstat 
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] 
md1 : active raid0 sda3[0] sdd3[3] sdc3[2] sdb3[1]
      15432220672 blocks super 1.2 512k chunks
      
md0 : active raid1 sda2[0] sdd2[3] sdb2[1]
      48794624 blocks super 1.2 [4/3] [UU_U]
      
unused devices: <none>
root@logstash2022:~# mdadm /dev/md0 --add /dev/sdc2
mdadm: added /dev/sdc2
root@logstash2022:~# cat /proc/mdstat 
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] 
md1 : active raid0 sda3[0] sdd3[3] sdc3[2] sdb3[1]
      15432220672 blocks super 1.2 512k chunks
      
md0 : active raid1 sdc2[4] sda2[0] sdd2[3] sdb2[1]
      48794624 blocks super 1.2 [4/3] [UU_U]
      [>....................]  recovery =  0.1% (61568/48794624) finish=39.5min speed=20522K/sec
      
unused devices: <none>
fgiunchedi claimed this task.

Tentatively resolving, will get reopened if it happens again.