Page MenuHomePhabricator

Degraded RAID on helium
Closed, ResolvedPublic

Description

TASK AUTO-GENERATED by Nagios/Icinga RAID event handler

A degraded RAID (megacli) was detected on host helium. An automatic snapshot of the current RAID status is attached below.

Please sync with the service owner to find the appropriate time window before actually replacing any failed hardware.

CRITICAL: 1 failed LD(s) (Partially Degraded)

$ sudo /usr/local/lib/nagios/plugins/get_raid_status_megacli
=== RaidStatus (does not include components in optimal state)
name: Adapter #0

	Virtual Drive: 0 (Target Id: 0)
	RAID Level: Primary-6, Secondary-0, RAID Level Qualifier-3
	State: =====> Partially Degraded <=====
	Number Of Drives: 12
	Number of Spans: 1
	Current Cache Policy: WriteThrough, ReadAheadNone, Direct, No Write Cache if Bad BBU

		Span: 0 - Number of PDs: 12

			PD: 3 Information
			Enclosure Device ID: 15
			Slot Number: 3
			Drive's position: DiskGroup: 0, Span: 0, Arm: 3
			Media Error Count: 339
			Other Error Count: 8
			Predictive Failure Count: =====> 4 <=====
			Last Predictive Failure Event Seq Number: 38216

				Raw Size: 3.638 TB [0x1d1c0beb0 Sectors]
				Firmware state: =====> Failed <=====
				Media Type: Hard Disk Device
				Drive Temperature: 39C (102.20 F)

=== RaidStatus completed

Related Objects

StatusSubtypeAssignedTask
Resolved Cmjohnson

Event Timeline

elukey triaged this task as High priority.
Cmjohnson added a subtask: Unknown Object (Task).Jan 7 2019, 4:28 PM

helium is out of warranty, I created a procurement task to purchase a replacement disk.

Cmjohnson closed subtask Unknown Object (Task) as Resolved.Jan 24 2019, 9:45 PM
This comment was removed by Volans.

It seems all good from megacli:

$ sudo /usr/local/lib/nagios/plugins/get-raid-status-megacli
=== RaidStatus (does not include components in optimal state)
=== RaidStatus completed

And icinga too is all green, resolving.