Page MenuHomePhabricator

Degraded RAID on centrallog1002
Closed, ResolvedPublic

Description

TASK AUTO-GENERATED by Nagios/Icinga RAID event handler

A degraded RAID (md) was detected on host centrallog1002. An automatic snapshot of the current RAID status is attached below.

Please sync with the service owner to find the appropriate time window before actually replacing any failed hardware.

CRITICAL: State: degraded, Active: 7, Working: 7, Failed: 1, Spare: 0

$ sudo /usr/local/lib/nagios/plugins/get-raid-status-md
Personalities : [raid10] [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] 
md1 : active raid10 sdh1[3](F) sdg1[2] sdf1[1] sde1[0]
      3750481920 blocks super 1.2 512K chunks 2 near-copies [4/3] [UUU_]
      bitmap: 4/28 pages [16KB], 65536KB chunk

md0 : active raid10 sdb2[0] sda2[1] sdd2[3] sdc2[2]
      1874534400 blocks super 1.2 512K chunks 2 near-copies [4/4] [UUUU]
      bitmap: 13/14 pages [52KB], 65536KB chunk

unused devices: <none>

Event Timeline

@Jclark-ctr it looks like one of the new SSDs from {T359452} isn't happy, I've located the drive so it should be blinking; could we replace it ASAP? please ping me on IRC when you can, thank you !

Also cc @VRiley-WMF if you could help with this? thank you!

Replaced failed ssd with extra from onhands at eqiad

Jclark-ctr claimed this task.