Page MenuHomePhabricator

Failed disk on analytics1069.eqiad.wmnet
Closed, ResolvedPublic

Description

This was first discovered when rebooting analytics1069 as part of: T291732: analytics1069 mgmt interface intermittently goes up and down

The drive is /dev/sdc

I believe that the drive bay ID is:

Enclosure Device ID: 32
Slot Number: 1

lshw - class disk showed this for that disk.

*-disk:5
     description: SCSI Disk
     product: PERC H730 Mini
     vendor: DELL
     physical id: 2.2.0
     bus info: scsi@0:2.2.0
     logical name: /dev/sdc
     version: 4.27
     serial: 00e8b77f0502e7cb2000dc00aea06d86
     size: 3725GiB (4TB)
     capabilities: gpt-1.00 partitioned partitioned:gpt
     configuration: ansiversion=5 guid=672f14c1-6f49-4bc1-a651-8157035ee300 logicalsectorsize=512 sectorsize=512

From the above, I belive that the bus ID is 2, so the corresponding physical disk is shown here from the command sudo megacli -ldpdinfo -a0:

Virtual Drive: 2 (Target Id: 2)
Name                :
RAID Level          : Primary-0, Secondary-0, RAID Level Qualifier-0
Size                : 3.637 TB
Sector Size         : 512
Is VD emulated      : No
Parity Size         : 0
State               : Optimal
Strip Size          : 64 KB
Number Of Drives    : 1
Span Depth          : 1
Default Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad BBU
Current Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad BBU
Default Access Policy: Read/Write
Current Access Policy: Read/Write
Disk Cache Policy   : Disk's Default
Encryption Type     : None
Default Power Savings Policy: None
Current Power Savings Policy: None
Can spin up in 1 minute: Yes
LD has drives that support T10 power conditions: Yes
LD's IO profile supports MAX power savings with cached writes: No
Bad Blocks Exist: Yes
Is VD Cached: No
Number of Spans: 1
Span: 0 - Number of PDs: 1

PD: 0 Information
Enclosure Device ID: 32
Slot Number: 1
Drive's position: DiskGroup: 2, Span: 0, Arm: 0
Enclosure position: 1
Device Id: 1
WWN: 5000cca25de2222e
Sequence Number: 2
Media Error Count: 0
Other Error Count: 0
Predictive Failure Count: 0
Last Predictive Failure Event Seq Number: 0
PD Type: SATA

Raw Size: 3.638 TB [0x1d1c0beb0 Sectors]
Non Coerced Size: 3.637 TB [0x1d1b0beb0 Sectors]
Coerced Size: 3.637 TB [0x1d1b00000 Sectors]
Sector Size:  512
Logical Sector Size:  512
Physical Sector Size:  512
Firmware state: Online, Spun Up
Device Firmware Level: KN03
Shield Counter: 0
Successful diagnostics completion on :  N/A
SAS Address(0): 0x500056b3b9ffcec1
Connected Port Number: 0(path0) 
Inquiry Data:             K4JE2SGBHGST HUS726040ALA614                    A5DEKN03
FDE Capable: Not Capable
FDE Enable: Disable
Secured: Unsecured
Locked: Unlocked
Needs EKM Attention: No
Foreign State: None 
Device Speed: 6.0Gb/s 
Link Speed: 6.0Gb/s 
Media Type: Hard Disk Device
Drive Temperature :40C (104.00 F)
PI Eligibility:  No 
Drive is formatted for PI information:  No
PI: No PI
Drive's NCQ setting : N/A
Port-0 :
Port status: Active
Port's Linkspeed: 6.0Gb/s 
Drive has flagged a S.M.A.R.T alert : No

From that we can see that the physical slot is numbered 1.

sudo megacli -PDInfo -PhysDrv [32:1] -a0

Event Timeline

wiki_willy subscribed.

Looks like this was missing the "ops-eqiad" project tag, so it fell through the cracks. @BTullis - since the hardware was installed to refresh this host in T293922, do you still need this fixed? Thanks, Willy

HI @wiki_willy - Apologies for the missing tag. No, we can leave this disk in a failed state thanks. I'll work on the decom soon.
Thanks.

Cmjohnson claimed this task.
Cmjohnson subscribed.

resolving this since it's going to be decom'd anyway.