Page MenuHomePhabricator

Degraded RAID on db1070
Closed, ResolvedPublic

Description

TASK AUTO-GENERATED by Nagios/Icinga RAID event handler

A degraded RAID (megacli) was detected on host db1070. An automatic snapshot of the current RAID status is attached below.

Please sync with the service owner to find the appropriate time window before actually replacing any failed hardware.

=== RaidStatus (does not include components in optimal state)
name: Adapter #0

	Virtual Drive: 0 (Target Id: 0)
	RAID Level: Primary-1, Secondary-0, RAID Level Qualifier-0
	State: =====> Degraded <=====
	Number Of Drives per span: 2
	Number of Spans: 6
	Current Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad BBU

		Span: 5 - Number of PDs: 2

			PD: 0 Information
			Enclosure Device ID: 32
			Slot Number: 10
			Drive's position: DiskGroup: 0, Span: 5, Arm: 0
			Media Error Count: 28
			Other Error Count: 22
			Predictive Failure Count: =====> 6 <=====
			Last Predictive Failure Event Seq Number: 2232

				Raw Size: 558.911 GB [0x45dd2fb0 Sectors]
				Firmware state: =====> Failed <=====
				Media Type: Hard Disk Device
				Drive Temperature: 41C (105.80 F)

=== RaidStatus completed

Event Timeline

Marostegui added subscribers: Cmjohnson, Marostegui.

Hey @Cmjohnson it should be safe to change this disk when you have time
Thanks!

db1070 is under warranty for 2 more months. Requested new part from DEll

Congratulations: Work Order SR944780612 was successfully submitted.

Ottomata triaged this task as Medium priority.

I will assign this to @Cmjohnson so he can change the disk once it is onsite
Thanks!

db1070 is under warranty for 2 more months. Requested new part from DEll

Congratulations: Work Order SR944780612 was successfully submitted.

Hello @Cmjohnson!
Did the disk arrive?

Thanks!

@Marostegui disk is rebuilding

Enclosure Device ID: 32
Slot Number: 10
Drive's position: DiskGroup: 0, Span: 5, Arm: 0
Enclosure position: 1
Device Id: 10
WWN: 500003978859EC70
Sequence Number: 9
Media Error Count: 0
Other Error Count: 0
Predictive Failure Count: 0
Last Predictive Failure Event Seq Number: 0
PD Type: SAS

Raw Size: 558.911 GB [0x45dd2fb0 Sectors]
Non Coerced Size: 558.411 GB [0x45cd2fb0 Sectors]
Coerced Size: 558.375 GB [0x45cc0000 Sectors]
Sector Size: 0
Firmware state: Rebuild
Device Firmware Level: DT01
Shield Counter: 0
Successful diagnostics completion on : N/A
SAS Address(0): 0x500003978859ec72
SAS Address(1): 0x0
Connected Port Number: 0(path0)
Inquiry Data: TOSHIBA AL13SXL600N DT0117T0A079F5YE
FDE Capable: Not Capable
FDE Enable: Disable
Secured: Unsecured
Locked: Unlocked
Needs EKM Attention: No
Foreign State: None
Device Speed: 6.0Gb/s
Link Speed: 6.0Gb/s
Media Type: Hard Disk Device
Drive Temperature :31C (87.80 F)
PI Eligibility: No
Drive is formatted for PI information: No
PI: No PI
Port-0 :
Port status: Active
Port's Linkspeed: 6.0Gb/s
Port-1 :
Port status: Active
Port's Linkspeed: Unknown
Drive has flagged a S.M.A.R.T alert : No

Awesome!! Thank you!

root@db1070:~#  megacli -PDRbld -ShowProg -PhysDrv [32:10] -aALL

Rebuild Progress on Device at Enclosure 32, Slot 10 Completed 1% in 5 Minutes.

It is all good now, thank you Chris!

root@db1070:~#  megacli -PDRbld -ShowProg -PhysDrv [32:10] -aALL

Device(Encl-32 Slot-10) is not in rebuild process

Exit Code: 0x00
root@db1070:~# megacli -ldinfo -l0 -a0


Adapter 0 -- Virtual Drive Information:
Virtual Drive: 0 (Target Id: 0)
Name                :
RAID Level          : Primary-1, Secondary-0, RAID Level Qualifier-0
Size                : 3.271 TB
Sector Size         : 512
Mirror Data         : 3.271 TB
State               : Optimal