Page MenuHomePhabricator

Degraded RAID on cloudvirt1018
Closed, ResolvedPublic

Description

TASK AUTO-GENERATED by Nagios/Icinga RAID event handler

A degraded RAID (megacli) was detected on host cloudvirt1018. An automatic snapshot of the current RAID status is attached below.

Please sync with the service owner to find the appropriate time window before actually replacing any failed hardware.

CRITICAL: 1 failed LD(s) (Degraded)

$ sudo /usr/local/lib/nagios/plugins/get-raid-status-megacli
=== RaidStatus (does not include components in optimal state)
name: Adapter #0

	Virtual Drive: 0 (Target Id: 0)
	RAID Level: Primary-1, Secondary-0, RAID Level Qualifier-0
	State: =====> Degraded <=====
	Number Of Drives: 8
	Number of Spans: 1
	Current Cache Policy: WriteBack, ReadAhead, Direct, No Write Cache if Bad BBU

		Span: 0 - Number of PDs: 8

			PD: 2 Information
			Enclosure Device ID: 32
			Slot Number: 8
			Drive's position: DiskGroup: 0, Span: 0, Arm: 2
			Media Error Count: 0
			Other Error Count: 0
			Predictive Failure Count: 0
			Last Predictive Failure Event Seq Number: 0

				Raw Size: 1.455 TB [0xba4d4ab0 Sectors]
				Firmware state: =====> Rebuild <=====
				Media Type: Solid State Device
				Drive Temperature: 32C (89.60 F)

=== RaidStatus completed

Event Timeline

wiki_willy subscribed.

System is in-warranty (doesn't expire until May 2020)

Enclosure Device ID: 32
Slot Number: 2
Enclosure position: 1
Device Id: 2
WWN: 55cd2e415050a562
Sequence Number: 4
Media Error Count: 75
Other Error Count: 267
Predictive Failure Count: 0
Last Predictive Failure Event Seq Number: 0
PD Type: SATA

Raw Size: 1.746 TB [0xdf8fe2b0 Sectors]
Non Coerced Size: 1.745 TB [0xdf7fe2b0 Sectors]
Coerced Size: 1.745 TB [0xdf7c0000 Sectors]
Sector Size: 512
Logical Sector Size: 512
Physical Sector Size: 4096
Firmware state: Unconfigured(bad)
Device Firmware Level: DL61
Shield Counter: 0
Successful diagnostics completion on : N/A
SAS Address(0): 0x500056b3f5bd95c2
Connected Port Number: 0(path0)
Inquiry Data: PHYG845102F41P9DGNSSDSC2KG019T8R XCV1DL61
FDE Capable: Not Capable
FDE Enable: Disable
Secured: Unsecured
Locked: Unlocked
Needs EKM Attention: No
Foreign State: None
Device Speed: 6.0Gb/s
Link Speed: 6.0Gb/s
Media Type: Solid State Device
Drive Temperature :23C (73.40 F)
PI Eligibility: No
Drive is formatted for PI information: No
PI: No PI
Drive's NCQ setting : N/A
Port-0 :
Port status: Active
Port's Linkspeed: 6.0Gb/s
Drive has flagged a S.M.A.R.T alert : No

We had a really high IO usage on this server the other day, along with very high load avg.

image.png (1×1 px, 284 KB)

https://grafana.wikimedia.org/d/aJgffPPmz/wmcs-openstack-eqiad1-hypervisor?orgId=1&var-hypervisor=cloudvirt1018&refresh=30s

Not sure if this could be related.

CC'ng @Andrew and @JHedden so we keep and eye on this.

A ticket has been opened with Dell

You have successfully submitted request SR995773442.

The ticket was approved. the new ssd should arrive today or tomorrow

Disks replaced, please re-open an ping me if the disk fails