Page MenuHomePhabricator

Degraded RAID on analytics1039
Closed, ResolvedPublic

Description

TASK AUTO-GENERATED by Nagios/Icinga RAID event handler

A degraded RAID (megacli) was detected on host analytics1039. An automatic snapshot of the current RAID status is attached below.

Please sync with the service owner to find the appropriate time window before actually replacing any failed hardware.

CRITICAL: 1 failed LD(s) (Offline)

$ sudo /usr/local/lib/nagios/plugins/get_raid_status_megacli
=== RaidStatus (does not include components in optimal state)
name: Adapter #0

	Virtual Drive: 3 (Target Id: 3)
	RAID Level: Primary-0, Secondary-0, RAID Level Qualifier-0
	State: =====> Offline <=====
	Number Of Drives: 1
	Number of Spans: 1
	Current Cache Policy: WriteThrough, ReadAheadNone, Direct, No Write Cache if Bad BBU

		Span: 0 - Number of PDs: 1

			PD: 0 Information
			Enclosure Device ID: 32
			Slot Number: 2
			Drive's position: DiskGroup: 3, Span: 0, Arm: 0
			Media Error Count: 5
			Other Error Count: 3
			Predictive Failure Count: 0
			Last Predictive Failure Event Seq Number: 0

				Raw Size: 3.638 TB [0x1d1c0beb0 Sectors]
				Firmware state: =====> Failed <=====
				Media Type: Hard Disk Device
				Drive Temperature: 32C (89.60 F)

=== RaidStatus completed

Event Timeline

colewhite triaged this task as Medium priority.Nov 5 2018, 9:09 PM
Volans added subscribers: elukey, Ottomata, Volans.

Adding analytics, Luca and Otto in case it was missed. Also puppet has issues because of RO filesystem.

This host needs to be decommed relatively soon, there are others that showed this kind kind of behavior in the range 28-42.

Change 473684 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] Apply a hadoop config override to analytics1039

https://gerrit.wikimedia.org/r/473684

Change 473684 merged by Elukey:
[operations/puppet@production] Apply a hadoop config override to analytics1039

https://gerrit.wikimedia.org/r/473684

Change 473688 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] Apply an override to analytics1039 - part 2

https://gerrit.wikimedia.org/r/473688

Change 473688 merged by Elukey:
[operations/puppet@production] Apply an override to analytics1039 - part 2

https://gerrit.wikimedia.org/r/473688

elukey claimed this task.

The host will be decommed so no point in getting a new disk :)