Page MenuHomePhabricator

ms-be1004.eqiad.wmnet: slot=3 dev=sdd failed
Closed, ResolvedPublic

Description

slot=3 dev=sdd has been reported failed, please replace.

/var/log/kern.log

Sep  1 14:39:36 ms-be1004 kernel: [5118333.490239] Read(10): 28 00 95 fc b3 b0 00 00 10 00
Sep  1 14:39:36 ms-be1004 kernel: [5118333.490248] end_request: I/O error, dev sdd, sector 2516366256
Sep  1 14:39:36 ms-be1004 kernel: [5118333.497080] XFS (sdd1): metadata I/O error: block 0x95fcabb0 ("xfs_trans_read_buf_map") error 5 numblks 16
Sep  1 14:39:36 ms-be1004 kernel: [5118333.508083] XFS (sdd1): xfs_imap_to_bp: xfs_trans_read_buf() returned error 5.
Sep  1 14:44:37 ms-be1004 kernel: [5118633.985280] sd 0:2:3:0: [sdd]  
Sep  1 14:44:37 ms-be1004 kernel: [5118633.985289] Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK
Sep  1 14:44:37 ms-be1004 kernel: [5118633.985293] sd 0:2:3:0: [sdd] CDB: 
Sep  1 14:44:37 ms-be1004 kernel: [5118633.985296] Read(10): 28 00 95 fc b3 b0 00 00 10 00
Sep  1 14:44:37 ms-be1004 kernel: [5118633.985309] end_request: I/O error, dev sdd, sector 2516366256
Sep  1 14:44:37 ms-be1004 kernel: [5118633.992091] XFS (sdd1): metadata I/O error: block 0x95fcabb0 ("xfs_trans_read_buf_map") error 5 numblks 16
Sep  1 14:44:37 ms-be1004 kernel: [5118634.003087] XFS (sdd1): xfs_imap_to_bp: xfs_trans_read_buf() returned error 5.
Sep  1 14:44:37 ms-be1004 kernel: [5118634.072920] sd 0:2:3:0: [sdd]  
Sep  1 14:44:37 ms-be1004 kernel: [5118634.072926] Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK
Sep  1 14:44:37 ms-be1004 kernel: [5118634.072929] sd 0:2:3:0: [sdd] CDB: 
Sep  1 14:44:37 ms-be1004 kernel: [5118634.072930] Read(10): 28 00 95 fc b3 b0 00 00 10 00
Sep  1 14:44:37 ms-be1004 kernel: [5118634.072938] end_request: I/O error, dev sdd, sector 2516366256
Sep  1 14:44:37 ms-be1004 kernel: [5118634.079683] XFS (sdd1): metadata I/O error: block 0x95fcabb0 ("xfs_trans_read_buf_map") error 5 numblks 16
Sep  1 14:44:37 ms-be1004 kernel: [5118634.090673] XFS (sdd1): xfs_imap_to_bp: xfs_trans_read_buf() returned error 5.
Sep  1 15:37:00 ms-be1004 kernel: [5121777.641202] Process accounting paused
Sep  1 15:50:38 ms-be1004 kernel: [5122595.142481] Process accounting resumed

smartctl

megacli

^M                                     
Enclosure Device ID: 32
Slot Number: 3
Drive's position: DiskGroup: 3, Span: 0, Arm: 0
Enclosure position: 1
Device Id: 3
WWN: 5000039488CB3C4D
Sequence Number: 2
Media Error Count: 43
Other Error Count: 0
Predictive Failure Count: 0
Last Predictive Failure Event Seq Number: 0
PD Type: SAS

Raw Size: 1.819 TB [0xe8e088b0 Sectors]
Non Coerced Size: 1.818 TB [0xe8d088b0 Sectors]
Coerced Size: 1.818 TB [0xe8d00000 Sectors]
Sector Size:  0
Firmware state: Online, Spun Up
Device Firmware Level: DCA8
Shield Counter: 0
Successful diagnostics completion on :  N/A
SAS Address(0): 0x5000039488cb3c4e
SAS Address(1): 0x0
Connected Port Number: 0(path0) 
FDE Capable: Not Capable
FDE Enable: Disable
Secured: Unsecured
Locked: Unlocked
Needs EKM Attention: No
Foreign State: None 
Device Speed: 6.0Gb/s 
Link Speed: 6.0Gb/s 
Media Type: Hard Disk Device
Drive Temperature :30C (86.00 F)
PI Eligibility:  No 
Drive is formatted for PI information:  No
PI: No PI
Port-0 :
Port status: Active
Port's Linkspeed: 6.0Gb/s 
Port-1 :
Port status: Active
Port's Linkspeed: Unknown 
Drive has flagged a S.M.A.R.T alert : No




Exit Code: 0x00

Event Timeline

fgiunchedi updated the task description. (Show Details)
fgiunchedi added a project: ops-eqiad.
fgiunchedi added a subscriber: fgiunchedi.
Restricted Application added subscribers: Southparkfan, Aklapper. · View Herald Transcript

note that the disk is reported as ok by the raid controller, linux however encounters errors while using it

Mentioned in SAL (#wikimedia-operations) [2016-09-22T16:17:04Z] <godog> offline sdd on ms-be1004 via megacli T144499

Disk was replaced...needs to be added back.

Disk added back
Adapter 0: Created VD 3
Configured physical device at Encl-32:Slot-3.

1 physical devices are Configured on adapter 0.