Page MenuHomePhabricator

Degraded RAID on labstore1003
Closed, ResolvedPublic

Description

TASK AUTO-GENERATED by Nagios/Icinga RAID event handler

A degraded RAID (megacli) was detected on host labstore1003. An automatic snapshot of the current RAID status is attached below.

Please sync with the service owner to find the appropriate time window before actually replacing any failed hardware.

=== RaidStatus (does not include components in optimal state)
name: Adapter #0

	Virtual Drive: 0 (Target Id: 0)
	RAID Level: Primary-1, Secondary-0, RAID Level Qualifier-0
	State: Optimal
	Number Of Drives: 2
	Number of Spans: 1
	Current Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad BBU

		Span: 0 - Number of PDs: 2

			PD: 1 Information
			Enclosure Device ID: 32
			Slot Number: 1
			Drive's position: DiskGroup: 0, Span: 0, Arm: 1
			Media Error Count: 0
			Other Error Count: 0
			Predictive Failure Count: =====> 248 <=====
			Last Predictive Failure Event Seq Number: 50853

				Raw Size: 1.819 TB [0xe8e088b0 Sectors]
				Firmware state: Online, Spun Up
				Media Type: Hard Disk Device
				Drive Temperature: 32C (89.60 F)

name: Adapter #1

	Virtual Drive: 2 (Target Id: 2)
	RAID Level: Primary-6, Secondary-0, RAID Level Qualifier-3
	State: =====> Partially Degraded <=====
	Number Of Drives: 10
	Number of Spans: 1
	Current Cache Policy: WriteThrough, ReadAdaptive, Direct, No Write Cache if Bad BBU

		Span: 0 - Number of PDs: 10

			PD: 0 Information
			Enclosure Device ID: 38
			Slot Number: 11
			Drive's position: DiskGroup: 2, Span: 0, Arm: 0
			Media Error Count: 0
			Other Error Count: 1
			Predictive Failure Count: 0
			Last Predictive Failure Event Seq Number: 0

				Raw Size: 1.819 TB [0xe8e088b0 Sectors]
				Firmware state: =====> Rebuild <=====
				Media Type: Hard Disk Device
				Drive Temperature: 29C (84.20 F)

=== RaidStatus completed

Event Timeline

The disk has been swapped and is the in the process of rebuilding

Enclosure Device ID: 38
Slot Number: 0
Enclosure position: 2
Device Id: 40
WWN: 5000C50025FD9E58
Sequence Number: 2
Media Error Count: 0
Other Error Count: 1
Predictive Failure Count: 0
Last Predictive Failure Event Seq Number: 0
PD Type: SAS

Raw Size: 1.819 TB [0xe8e088b0 Sectors]
Non Coerced Size: 1.818 TB [0xe8d088b0 Sectors]
Coerced Size: 1.818 TB [0xe8d00000 Sectors]
Sector Size: 0
Firmware state: Copyback
Device Firmware Level: KS68
Shield Counter: 0
Successful diagnostics completion on : N/A
SAS Address(0): 0x5000c50025fd9e59
SAS Address(1): 0x5000c50025fd9e5a
Connected Port Number: 1(path0) 0(path1)
Inquiry Data: SEAGATE ST32000444SS KS689WM3EX7R
FDE Capable: Not Capable
FDE Enable: Disable
Secured: Unsecured
Locked: Unlocked
Needs EKM Attention: No
Foreign State: None
Device Speed: 6.0Gb/s
Link Speed: 6.0Gb/s
Media Type: Hard Disk Device
Drive Temperature :28C (82.40 F)
PI Eligibility: No
Drive is formatted for PI information: No
PI: No PI
Port-0 :
Port status: Active
Port's Linkspeed: 6.0Gb/s
Port-1 :
Port status: Active
Port's Linkspeed: Unknown
Drive has flagged a S.M.A.R.T alert : No

Should this be resolved? There is still a disk with predictive failure, but not yet failed:

$ sudo /usr/local/lib/nagios/plugins/get-raid-status-megacli
=== RaidStatus (does not include components in optimal state)
name: Adapter #0

	Virtual Drive: 0 (Target Id: 0)
	RAID Level: Primary-1, Secondary-0, RAID Level Qualifier-0
	State: Optimal
	Number Of Drives: 2
	Number of Spans: 1
	Current Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad BBU

		Span: 0 - Number of PDs: 2

			PD: 1 Information
			Enclosure Device ID: 32
			Slot Number: 1
			Drive's position: DiskGroup: 0, Span: 0, Arm: 1
			Media Error Count: 0
			Other Error Count: 0
			Predictive Failure Count: =====> 264 <=====
			Last Predictive Failure Event Seq Number: 51271

				Raw Size: 1.819 TB [0xe8e088b0 Sectors]
				Firmware state: Online, Spun Up
				Media Type: Hard Disk Device
				Drive Temperature: 31C (87.80 F)

=== RaidStatus completed
Cmjohnson claimed this task.

Resolving this