Page MenuHomePhabricator

db1058 (s5 master) degraded RAID
Closed, ResolvedPublic

Event Timeline

jcrespo raised the priority of this task from to High.
jcrespo updated the task description. (Show Details)
jcrespo added projects: acl*sre-team, DBA.
jcrespo changed the edit policy from "All Users" to "acl*sre-team (Project)".
Joe set Security to None.
root@db1058:~$ megacli -AdpAllInfo -aALL
                                     
Adapter #0

==============================================================================
[...]
               Device Present
                ================
Virtual Drives    : 1 
  Degraded        : 1 
  Offline         : 0 
Physical Devices  : 14 
  Disks           : 12 
  Critical Disks  : 1 
  Failed Disks    : 1 

[...]
Exit Code: 0x00
seqNum: 0x000009b6
Time: Mon Jun  8 14:10:34 2015

Code: 0x000000fb
Class: 2
Locale: 0x01
Event Description: VD 00/0 is now DEGRADED
Event Data:
===========
Target Id: 0


seqNum: 0x000009b5
Time: Mon Jun  8 14:10:34 2015

Code: 0x00000051
Class: 0
Locale: 0x01
Event Description: State change on VD 00/0 from OPTIMAL(3) to DEGRADED(2)
Event Data:
===========
===========
Target Id: 0
Previous state: 3
New state: 2


seqNum: 0x000009b4
Time: Mon Jun  8 14:10:34 2015

Code: 0x00000072
Class: 0
Locale: 0x02
Event Description: State change on PD 06(e0x20/s6) from ONLINE(18) to FAILED(11)
Event Data:
===========
Device ID: 6
Enclosure Index: 32
Slot Number: 6
Previous state: 24
New state: 17


seqNum: 0x000009b3
Time: Mon Jun  8 14:10:34 2015

Code: 0x00000057
Class: 1
Locale: 0x02
Event Description: Error on PD 06(e0x20/s6) (Error 02)
Event Data:
===========
Device ID: 6
Enclosure Index: 32
Slot Number: 6
Error: 2


seqNum: 0x000009b2
Time: Mon Jun  8 14:10:34 2015

Code: 0x00000071
Class: 0
Locale: 0x02
Event Description: Unexpected sense: PD 06(e0x20/s6) Path 5000c5005abb01f1, CDB: 28 00 1b 83 d5 c0 00 00 20 00, Sense: 4/32/00

PD: 0 Information
Enclosure Device ID: 32
Slot Number: 6
Drive's position: DiskGroup: 0, Span: 3, Arm: 0
Enclosure position: 1
Device Id: 6
WWN: 5000C5005ABB01F0
Sequence Number: 3
Media Error Count: 75
Other Error Count: 16
Predictive Failure Count: 10
Last Predictive Failure Event Seq Number: 2473
PD Type: SAS

Raw Size: 558.911 GB [0x45dd2fb0 Sectors]
Non Coerced Size: 558.411 GB [0x45cd2fb0 Sectors]
Coerced Size: 558.375 GB [0x45cc0000 Sectors]
Sector Size: 0
Firmware state: Failed
Device Firmware Level: ES66
Shield Counter: 0
Successful diagnostics completion on : N/A
SAS Address(0): 0x5000c5005abb01f1
SAS Address(1): 0x0
Connected Port Number: 0(path0)
Inquiry Data: SEAGATE ST3600057SS ES666SL5DGXN
FDE Capable: Not Capable
FDE Enable: Disable
Secured: Unsecured
Locked: Unlocked
Needs EKM Attention: No
Foreign State: None
Device Speed: 6.0Gb/s
Link Speed: 6.0Gb/s
Media Type: Hard Disk Device
Drive Temperature :39C (102.20 F)
PI Eligibility: No
Drive is formatted for PI information: No
PI: No PI
Port-0 :
Port status: Active
Port's Linkspeed: 6.0Gb/s
Port-1 :
Port status: Active
Port's Linkspeed: Unknown
Drive has flagged a S.M.A.R.T alert : Yes

This should start a flashing light on that particular bay (if it works):
root@db1058:~$ megacli -pdLocate -start -PhysDrv \[32:6\] -aALL
And this should stop it:
root@db1058:~$ megacli -pdLocate -stop -PhysDrv \[32:6\] -aALL

And this should show the disk being automaticaly rebuilt once replaced:
megacli -AdpEventLog -GetLatest 100 -f events.log -aALL

Does anyone know if there are spare disks onsite?

There are no spare disks on-site they need to be ordered from Dell. It
takes 24 hours from the time I create the ticket. In the future just let
me know there is a bad disk I know what to do from there. Will save you
some time. Once I create a ticket I will post the dispatch info

Thanks

Chris

Congratulations: Work Order SR912219395 was successfully submitted.

Disk replaced and is rebuilding

Enclosure Device ID: 32
Slot Number: 6
Drive's position: DiskGroup: 0, Span: 3, Arm: 0
Enclosure position: 1
Device Id: 6
WWN: 5000C50088E550E8
Sequence Number: 11
Media Error Count: 0
Other Error Count: 0
Predictive Failure Count: 0
Last Predictive Failure Event Seq Number: 0
PD Type: SAS

Raw Size: 558.911 GB [0x45dd2fb0 Sectors]
Non Coerced Size: 558.411 GB [0x45cd2fb0 Sectors]
Coerced Size: 558.375 GB [0x45cc0000 Sectors]
Sector Size: 0
Firmware state: Rebuild

Return tracking information for the bad disk

FEDEX
9611918 2393026 48853858

New disk is online

Firmware state: Online, Spun Up