Page MenuHomePhabricator

Degraded RAID on dbstore1002
Closed, ResolvedPublic

Description

TASK AUTO-GENERATED by Nagios/Icinga RAID event handler

A degraded RAID (megacli) was detected on host dbstore1002. An automatic snapshot of the current RAID status is attached below.

Please sync with the service owner to find the appropriate time window before actually replacing any failed hardware.

CRITICAL: 1 failed LD(s) (Degraded)

$ sudo /usr/local/lib/nagios/plugins/get_raid_status_megacli
=== RaidStatus (does not include components in optimal state)
name: Adapter #0

	Virtual Drive: 0 (Target Id: 0)
	RAID Level: Primary-1, Secondary-0, RAID Level Qualifier-0
	State: =====> Degraded <=====
	Number Of Drives per span: 2
	Number of Spans: 6
	Current Cache Policy: WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU

		Span: 1 - Number of PDs: 2

			PD: 0 Information
			Enclosure Device ID: 32
			Slot Number: 2
			Drive's position: DiskGroup: 0, Span: 1, Arm: 0
			Media Error Count: 1
			Other Error Count: 17
			Predictive Failure Count: 0
			Last Predictive Failure Event Seq Number: 0

				Raw Size: 1.090 TB [0x8bba0cb0 Sectors]
				Firmware state: =====> Failed <=====
				Media Type: Hard Disk Device
				Drive Temperature: 28C (82.40 F)

=== RaidStatus completed

Event Timeline

elukey triaged this task as High priority.Oct 16 2018, 6:44 AM
elukey added a subscriber: Cmjohnson.

@Cmjohnson this server is OOW but the replacement will take time to arrive (still in procurement..) and this host is really important for the research users. Do we have a spare disk that we can swap?

@elukey dbstore1002 is out of warranty and has 1.2T disks. I don't have disks this size but can replace with a 2TB disk..

@elukey dbstore1002 is out of warranty and has 1.2T disks. I don't have disks this size but can replace with a 2TB disk..

Let's do it
Thanks

RAID is back to optimal!
@Cmjohnson was this disk replaced then?

root@dbstore1002:~# megacli -LDInfo -lALL -aALL


Adapter 0 -- Virtual Drive Information:
Virtual Drive: 0 (Target Id: 0)
Name                :
RAID Level          : Primary-1, Secondary-0, RAID Level Qualifier-0
Size                : 6.545 TB
Sector Size         : 512
Mirror Data         : 6.545 TB
State               : Optimal
Strip Size          : 256 KB
Number Of Drives per span:2
Span Depth          : 6
Default Cache Policy: WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU
Current Cache Policy: WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU

@Marostegui no, I reseated the disk but I see it's optimal. I am resolving the task. If it comes back we can open it again.