Page MenuHomePhabricator

Degraded RAID on analytics1029
Closed, DeclinedPublic

Description

TASK AUTO-GENERATED by Nagios/Icinga RAID event handler

A degraded RAID (megacli) was detected on host analytics1029. An automatic snapshot of the current RAID status is attached below.

Please sync with the service owner to find the appropriate time window before actually replacing any failed hardware.

CRITICAL: 1 failed LD(s) (Offline)

$ sudo /usr/local/lib/nagios/plugins/get-raid-status-megacli
=== RaidStatus (does not include components in optimal state)
name: Adapter #0

	Virtual Drive: 12 (Target Id: 12)
	RAID Level: Primary-0, Secondary-0, RAID Level Qualifier-0
	State: =====> Offline <=====
	Number Of Drives: 1
	Number of Spans: 1
	Current Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad BBU

		Span: 0 - Number of PDs: 1

			PD: 0 Information
			Enclosure Device ID: 32
			Slot Number: 11
			Drive's position: DiskGroup: 10, Span: 0, Arm: 0
			Media Error Count: 330
			Other Error Count: 30887
			Predictive Failure Count: 0
			Last Predictive Failure Event Seq Number: 0

				Raw Size: 3.638 TB [0x1d1c0beb0 Sectors]
				Firmware state: =====> Failed <=====
				Media Type: Hard Disk Device
				Drive Temperature: N/A

=== RaidStatus completed

Event Timeline

Cmjohnson added subscribers: Ottomata, Cmjohnson.

@Ottomata Please use the new hardware run book template. https://phabricator.wikimedia.org/maniphest/task/edit/form/55/ and assign the task to me

elukey claimed this task.Jul 22 2019, 2:41 PM
elukey added a subscriber: elukey.

@Cmjohnson this is a OOW node part of the Hadoop testing cluster, no need to replace the disk.. will try to silence this alarm :)

Cmjohnson closed this task as Declined.Jul 22 2019, 2:48 PM