Page MenuHomePhabricator

Degraded RAID on db1100
Closed, ResolvedPublic

Description

TASK AUTO-GENERATED by Nagios/Icinga RAID event handler

A degraded RAID (megacli) was detected on host db1100. An automatic snapshot of the current RAID status is attached below.

Please sync with the service owner to find the appropriate time window before actually replacing any failed hardware.

CRITICAL: 1 failed LD(s) (Degraded)

$ sudo /usr/local/lib/nagios/plugins/get-raid-status-megacli
=== RaidStatus (does not include components in optimal state)
name: Adapter #0

	Virtual Drive: 0 (Target Id: 0)
	RAID Level: Primary-1, Secondary-0, RAID Level Qualifier-0
	State: =====> Degraded <=====
	Number Of Drives: 10
	Number of Spans: 1
	Current Cache Policy: WriteBack, ReadAhead, Direct, No Write Cache if Bad BBU

		Span: 0 - Number of PDs: 10

			PD: 0 Information
			ERROR: =====> MISSING DRIVE INFO <=====

=== RaidStatus completed

Event Timeline

Restricted Application added a subscriber: Marostegui. · View Herald TranscriptDec 28 2019, 6:02 AM
Marostegui triaged this task as High priority.Dec 28 2019, 9:25 AM
Marostegui added a project: DBA.

This is s5 primary database master, let's get the new disk soon.
This host is under warranty, I guess it should be just an RMA for this failed disk.

Marostegui moved this task from Triage to In progress on the DBA board.Dec 28 2019, 9:32 AM

@Jclark-ctr - when you come in next, can you open up a RMA for this one? Thanks, Willy

Any ETA on when the new disk will be ordered? I wouldn't like to leave the primary database master for s5 with a broken disk for long. If another disk on the same span fails, the master will go down.

Drive was ordered should arrive shortly will update when it arrives

@Marostegui Drive has arrives Please PM me on IRC so we can get this swapped

@Marostegui Drive has arrives Please PM me on IRC so we can get this swapped

I have messaged you, 8AM EST is a bit late for me, so let's schedule this for Monday as I have meetings till quite late in my time, so I will be online anyways.
Thanks!

slot appears to be 0 as discussed on irc we will change monday

@Jclark-ctr I think I calculated wrongly the converstion UTC and EST, if you are around the DC now, please change the disk :-)

slot appears to be 0 as discussed on irc we will change monday

Yep, looks like it from my side too

@Jclark-ctr feel free to replace the disk once you get to the DC, disk #0 is the one.

Replaced Disk #0

Thanks - it is now rebuilding. I will close the task once it is finished

PD: 0 Information
Enclosure Device ID: 32
Slot Number: 0
Drive's position: DiskGroup: 0, Span: 0, Arm: 0
Enclosure position: 1
Device Id: 0
WWN: 55cd2e4150a93376
Sequence Number: 11
Media Error Count: 0
Other Error Count: 0
Predictive Failure Count: 0
Last Predictive Failure Event Seq Number: 0
PD Type: SATA

Raw Size: 894.252 GB [0x6fc81ab0 Sectors]
Non Coerced Size: 893.752 GB [0x6fb81ab0 Sectors]
Coerced Size: 893.75 GB [0x6fb80000 Sectors]
Sector Size:  512
Logical Sector Size:  512
Physical Sector Size:  4096
Firmware state: Rebuild
Marostegui closed this task as Resolved.Jan 14 2020, 3:10 PM

All good - thank you!

Number of Virtual Disks: 1
Virtual Drive: 0 (Target Id: 0)
Name                :
RAID Level          : Primary-1, Secondary-0, RAID Level Qualifier-0
Size                : 3.635 TB
Sector Size         : 512
Is VD emulated      : Yes
Mirror Data         : 3.635 TB
State               : Optimal
Strip Size          : 256 KB
Number Of Drives    : 10
Span Depth          : 1
Default Cache Policy: WriteBack, ReadAhead, Direct, No Write Cache if Bad BBU
Current Cache Policy: WriteBack, ReadAhead, Direct, No Write Cache if Bad BBU
Default Access Policy: Read/Write
Current Access Policy: Read/Write
Disk Cache Policy   : Disk's Default
Encryption Type     : None
Default Power Savings Policy: Controller Defined
Current Power Savings Policy: None
Can spin up in 1 minute: No
LD has drives that support T10 power conditions: No
LD's IO profile supports MAX power savings with cached writes: No
Bad Blocks Exist: No
Is VD Cached: No
Number of Spans: 1
Span: 0 - Number of PDs: 10