Page MenuHomePhabricator

(OoW) Degraded RAID on analytics1032
Closed, ResolvedPublic

Description

TASK AUTO-GENERATED by Nagios/Icinga RAID event handler

A degraded RAID (megacli) was detected on host analytics1032. An automatic snapshot of the current RAID status is attached below.

Please sync with the service owner to find the appropriate time window before actually replacing any failed hardware.

CRITICAL: 1 failed LD(s) (Offline)

$ sudo /usr/local/lib/nagios/plugins/get-raid-status-megacli
=== RaidStatus (does not include components in optimal state)
name: Adapter #0

	Virtual Drive: 11 (Target Id: 11)
	RAID Level: Primary-0, Secondary-0, RAID Level Qualifier-0
	State: =====> Offline <=====
	Number Of Drives: 1
	Number of Spans: 1
	Current Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad BBU

		Span: 0 - Number of PDs: 1

			PD: 0 Information
			Enclosure Device ID: 32
			Slot Number: 10
			Drive's position: DiskGroup: 9, Span: 0, Arm: 0
			Media Error Count: 24
			Other Error Count: 1
			Predictive Failure Count: 0
			Last Predictive Failure Event Seq Number: 0

				Raw Size: 3.638 TB [0x1d1c0beb0 Sectors]
				Firmware state: =====> Failed <=====
				Media Type: Hard Disk Device
				Drive Temperature: 33C (91.40 F)

=== RaidStatus completed

Event Timeline

akosiaris triaged this task as Normal priority.Jul 15 2019, 2:51 PM
wiki_willy added subscribers: elukey, Cmjohnson, wiki_willy.

@Cmjohnson - looks like this server is out of warranty and just past the 5yr mark, but is also tied to a refresh order last Q2 in FY19-20 under T204177. Also, seems like it's being used as a test server now per the following:

https://phabricator.wikimedia.org/rOPUP8ea3d23bbe46e7893a75855641213f8dd51507bf

@elukey - is there any way we can either remove the alerting for analytics1032 or decommission this host?

Thanks in advance,
Willy

wiki_willy renamed this task from Degraded RAID on analytics1032 to (OoW) Degraded RAID on analytics1032.Jul 15 2019, 7:22 PM
elukey closed this task as Resolved.Aug 6 2019, 10:12 AM

The alert should not fire again (I hope), I have disabled it via Icinga UI. Closing :)