Degraded RAID on centrallog1002
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	ops-monitoring-bot
	Mar 25 2024, 12:21 AM

Description

TASK AUTO-GENERATED by Nagios/Icinga RAID event handler

A degraded RAID (md) was detected on host centrallog1002. An automatic snapshot of the current RAID status is attached below.

Please sync with the service owner to find the appropriate time window before actually replacing any failed hardware.

CRITICAL: State: degraded, Active: 7, Working: 7, Failed: 1, Spare: 0

$ sudo /usr/local/lib/nagios/plugins/get-raid-status-md
Personalities : [raid10] [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] 
md1 : active raid10 sdh1[3](F) sdg1[2] sdf1[1] sde1[0]
      3750481920 blocks super 1.2 512K chunks 2 near-copies [4/3] [UUU_]
      bitmap: 4/28 pages [16KB], 65536KB chunk

md0 : active raid10 sdb2[0] sda2[1] sdd2[3] sdc2[2]
      1874534400 blocks super 1.2 512K chunks 2 near-copies [4/4] [UUUU]
      bitmap: 13/14 pages [52KB], 65536KB chunk

unused devices: <none>

Event Timeline

ops-monitoring-bot created this task.Mar 25 2024, 12:21 AM

@Jclark-ctr it looks like one of the new SSDs from {T359452} isn't happy, I've located the drive so it should be blinking; could we replace it ASAP? please ping me on IRC when you can, thank you !

andrea.denisse subscribed.Mar 26 2024, 2:14 PM

Also cc @VRiley-WMF if you could help with this? thank you!

Replaced failed ssd with extra from onhands at eqiad

Jclark-ctr closed this task as Resolved.Mar 28 2024, 3:47 PM

Jclark-ctr claimed this task.

Degraded RAID on centrallog1002Closed, ResolvedPublicActions

Description

Event Timeline

Degraded RAID on centrallog1002
Closed, ResolvedPublic
Actions