Page MenuHomePhabricator

Degraded RAID on backup1002
Closed, ResolvedPublic

Description

TASK AUTO-GENERATED by Nagios/Icinga RAID event handler

A degraded RAID (megacli) was detected on host backup1002. An automatic snapshot of the current RAID status is attached below.

Please sync with the service owner to find the appropriate time window before actually replacing any failed hardware.

CRITICAL: 1 failed LD(s) (Partially Degraded)

$ sudo /usr/local/lib/nagios/plugins/get-raid-status-megacli
=== RaidStatus (does not include components in optimal state)
name: Adapter #0

	Virtual Drive: 0 (Target Id: 0)
	RAID Level: Primary-6, Secondary-0, RAID Level Qualifier-3
	State: =====> Partially Degraded <=====
	Number Of Drives: 12
	Number of Spans: 1
	Current Cache Policy: WriteBack, ReadAhead, Direct, No Write Cache if Bad BBU

		Span: 0 - Number of PDs: 12

			PD: 4 Information
			Enclosure Device ID: 251
			Slot Number: 4
			Drive's position: DiskGroup: 0, Span: 0, Arm: 4
			Media Error Count: 0
			Other Error Count: 14
			Predictive Failure Count: 0
			Last Predictive Failure Event Seq Number: 0

				Raw Size: 7.277 TB [0x3a3812ab0 Sectors]
				Firmware state: =====> Failed <=====
				Media Type: Hard Disk Device
				Drive Temperature: 29C (84.20 F)

=== RaidStatus completed

Event Timeline

Joe triaged this task as Medium priority.Mon, Oct 4, 5:56 AM
Joe added a subscriber: jcrespo.

@Cmjohnson or @Jclark-ctr can we get a request for a disk replacement sent to Dell? This host was bought last year.

A ticket has been opened with Dell, interesting enough they didn't have HDD as a pre-selected option to replace. Hopefully, having to add this does not delay the processing.
You have successfully submitted request SR1071943281.

they didn't have HDD as a pre-selected option to replace

:-(
/me crosses fingers.
With the RAID 6 we can lose any other disk, unlike the RAID10, so we have some margin, but ofc it is not an ideal situation. Looking forward to further updates.

our ticket was declined, I opened a ticket for backup1001 and the error is on a disk shelf.

Opened a new ticket with Dell for the disk shelf,

You have successfully submitted request SR1072235267.

@Jclark-ctr This disk should arrive today or Monday. Please swap the failed disk, it will be on the disk array for backup1002.

Drive Arrived today Replaced

$ megacli -PDRbld -ShowProg -PhysDrv '[251:4]' -aALL
                                     
Rebuild Progress on Device at Enclosure 251, Slot 4 Completed 97% in 11 Minutes.
RECOVERY - MegaRAID on backup1002 is OK: OK: optimal, 1 logical, 12

Thanks, @Jclark-ctr !