Page MenuHomePhabricator

ms-be2012.codfw.wmnet: slot=10 dev=sdk failed
Closed, ResolvedPublic

Description

slot=10 dev=sdk has been reported failed, please replace.

/var/log/kern.log

May 23 09:09:35 ms-be2012 kernel: [5691160.527512] sd 0:2:10:0: [sdk]  
May 23 09:09:35 ms-be2012 kernel: [5691160.527514] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
May 23 09:09:35 ms-be2012 kernel: [5691160.527516] sd 0:2:10:0: [sdk] CDB: 
May 23 09:09:35 ms-be2012 kernel: [5691160.527517] Read(10): 28 00 00 00 00 00 00 00 08 00
May 23 09:09:35 ms-be2012 kernel: [5691160.527546] sd 0:2:10:0: [sdk]  
May 23 09:09:35 ms-be2012 kernel: [5691160.527548] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
May 23 09:09:35 ms-be2012 kernel: [5691160.527549] sd 0:2:10:0: [sdk] CDB: 
May 23 09:09:35 ms-be2012 kernel: [5691160.527550] Read(10): 28 00 00 00 00 00 00 00 08 00
May 23 09:09:35 ms-be2012 kernel: [5691160.527583] sd 0:2:10:0: [sdk]  
May 23 09:09:35 ms-be2012 kernel: [5691160.527585] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
May 23 09:09:35 ms-be2012 kernel: [5691160.527586] sd 0:2:10:0: [sdk] CDB: 
May 23 09:09:35 ms-be2012 kernel: [5691160.527587] Read(10): 28 00 00 00 00 38 00 00 08 00
May 23 09:09:35 ms-be2012 kernel: [5691160.527618] sd 0:2:10:0: [sdk]  
May 23 09:09:35 ms-be2012 kernel: [5691160.527620] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
May 23 09:09:35 ms-be2012 kernel: [5691160.527622] sd 0:2:10:0: [sdk] CDB: 
May 23 09:09:35 ms-be2012 kernel: [5691160.527622] Read(10): 28 00 00 00 00 00 00 00 08 00
May 23 09:09:35 ms-be2012 kernel: [5691160.527655] sd 0:2:10:0: [sdk]  
May 23 09:09:35 ms-be2012 kernel: [5691160.527656] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
May 23 09:09:35 ms-be2012 kernel: [5691160.527658] sd 0:2:10:0: [sdk] CDB: 
May 23 09:09:35 ms-be2012 kernel: [5691160.527659] Read(10): 28 00 00 00 00 00 00 00 08 00

smartctl

megacli

^M                                     
Enclosure Device ID: 32
Slot Number: 10
Drive's position: DiskGroup: 10, Span: 0, Arm: 0
Enclosure position: 1
Device Id: 10
WWN: 5000C500559552C8
Sequence Number: 3
Media Error Count: 426
Other Error Count: 138
Predictive Failure Count: 19
Last Predictive Failure Event Seq Number: 21303
PD Type: SAS

Raw Size: 1.819 TB [0xe8e088b0 Sectors]
Non Coerced Size: 1.818 TB [0xe8d088b0 Sectors]
Coerced Size: 1.818 TB [0xe8d00000 Sectors]
Sector Size:  0
Firmware state: Failed
Device Firmware Level: RS11
Shield Counter: 0
Successful diagnostics completion on :  N/A
SAS Address(0): 0x5000c500559552c9
SAS Address(1): 0x0
Connected Port Number: 0(path0) 
FDE Capable: Not Capable
FDE Enable: Disable
Secured: Unsecured
Locked: Unlocked
Needs EKM Attention: No
Foreign State: None 
Device Speed: 6.0Gb/s 
Link Speed: 6.0Gb/s 
Media Type: Hard Disk Device
Drive Temperature :33C (91.40 F)
PI Eligibility:  No 
Drive is formatted for PI information:  No
PI: No PI
Port-0 :
Port status: Active
Port's Linkspeed: 6.0Gb/s 
Port-1 :
Port status: Active
Port's Linkspeed: Unknown 
Drive has flagged a S.M.A.R.T alert : Yes




Exit Code: 0x00

Related Objects

Event Timeline

fgiunchedi updated the task description. (Show Details)
fgiunchedi added a project: ops-codfw.
fgiunchedi subscribed.
Restricted Application added subscribers: Zppix, Southparkfan, Aklapper. · View Herald Transcript
Papaul triaged this task as Medium priority.May 26 2016, 4:32 PM

@fgiunchedi the system is out of warrant and have to 2TB SAS disk on site will have to check with @ Robh to see if i need to open a procurement ticket for a 2TB SAS disk 7.2K

@RobH ms-be2012 is out of warranty and there is a faulty disk in slot 10 will like for me to open a procurement task for 1 disks or have like 3 disks for spares are well?
Disk information:
SAS 2TB 7.2 K
Model: ST32000645ss
3.5"

@Papaul: It isn't entirely clear to me what you mean. Can you re-clarify your statement/questions for clarity?

I think you are asking if you need to create a procurement S4 task to request a spare disk. I'm not sure if you are asking if you can use a spare disk already on the shelf or if you want to order more than one disk?

I see that there is ST2000DM001 7.2K 2TB disk on the spares page, but it is SATA not SAS. We don't seem to have any 2TB SAS spare, so is the question:

So if the question is: Do I create a procurement s4 task and request more than 1 disk replacement?

Answer: Yes please. Create a procurement s4 task for any hardware orders for shelf spares. Since it seems we are keeping the ms-be older hardware for another quarter (or more), we don't want to keep 10 spares on the shelf, but we should try to keep 2-3. So I'd suggest we order 4 more now, which will leave us 3 on the spares list.

@Papaul please order one more spare since there's another disk waiting for replacement in T137785: ms-be2003.codfw.wmnet: slot=4 dev=sde failed and should be 2TB SAS too

Papaul mentioned this in Unknown Object (Task).Jun 15 2016, 4:03 PM

So the task should be for 5 disks. That will put 2 into immediate use and 3 on the shelf.

RobH added a subtask: Unknown Object (Task).Jun 15 2016, 4:05 PM
Papaul subscribed.

Disk replacement complete

RobH closed subtask Unknown Object (Task) as Resolved.Oct 12 2016, 5:47 PM