Page MenuHomePhabricator

Degraded RAID on ms-be1039
Closed, ResolvedPublic

Description

TASK AUTO-GENERATED by Nagios/Icinga RAID event handler

A degraded RAID (hpssacli) was detected on host ms-be1039. An automatic snapshot of the current RAID status is attached below.

Please sync with the service owner to find the appropriate time window before actually replacing any failed hardware.

[2223144.766649] XFS (sdc1): xfs_imap_to_bp: xfs_trans_read_buf() returned error -61.
[2223144.766905] sd 0:1:0:2: [sdc] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[2223144.766911] sd 0:1:0:2: [sdc] tag#0 Sense Key : Medium Error [current] 
[2223144.766916] sd 0:1:0:2: [sdc] tag#0 Add. Sense: Unrecovered read error
[2223144.766920] sd 0:1:0:2: [sdc] tag#0 CDB: Read(16) 88 00 00 00 00 01 4f 88 7c 00 00 00 00 10 00 00
[2223144.766923] blk_update_request: critical medium error, dev sdc, sector 5629312000
[2223144.809412] XFS (sdc1): metadata I/O error: block 0x14f887400 ("xfs_trans_read_buf_map") error 61 numblks 16
[2223144.861246] XFS (sdc1): xfs_imap_to_bp: xfs_trans_read_buf() returned error -61.
[2223144.861485] sd 0:1:0:2: [sdc] tag#1 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[2223144.861490] sd 0:1:0:2: [sdc] tag#1 Sense Key : Medium Error [current] 
[2223144.861493] sd 0:1:0:2: [sdc] tag#1 Add. Sense: Unrecovered read error
[2223144.861497] sd 0:1:0:2: [sdc] tag#1 CDB: Read(16) 88 00 00 00 00 01 4f 88 7c 00 00 00 00 10 00 00
[2223144.861500] blk_update_request: critical medium error, dev sdc, sector 5629312000
[2223144.904374] XFS (sdc1): metadata I/O error: block 0x14f887400 ("xfs_trans_read_buf_map") error 61 numblks 16
[2223144.959306] XFS (sdc1): xfs_imap_to_bp: xfs_trans_read_buf() returned error -61.
[2223144.960140] sd 0:1:0:2: [sdc] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[2223144.960145] sd 0:1:0:2: [sdc] tag#0 Sense Key : Medium Error [current] 
[2223144.960148] sd 0:1:0:2: [sdc] tag#0 Add. Sense: Unrecovered read error
[2223144.960151] sd 0:1:0:2: [sdc] tag#0 CDB: Read(16) 88 00 00 00 00 01 4f 88 7c 00 00 00 00 10 00 00
[2223144.960153] blk_update_request: critical medium error, dev sdc, sector 5629312000
[2223145.001841] XFS (sdc1): metadata I/O error: block 0x14f887400 ("xfs_trans_read_buf_map") error 61 numblks 16
[2223145.053266] XFS (sdc1): xfs_imap_to_bp: xfs_trans_read_buf() returned error -61.
[2223145.057406] sd 0:1:0:2: [sdc] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[2223145.057796] sd 0:1:0:2: [sdc] tag#0 Sense Key : Medium Error [current] 
[2223145.057989] sd 0:1:0:2: [sdc] tag#0 Add. Sense: Unrecovered read error
[2223145.058191] sd 0:1:0:2: [sdc] tag#0 CDB: Read(16) 88 00 00 00 00 01 4f 88 7c 00 00 00 00 10 00 00
[2223145.058194] blk_update_request: critical medium error, dev sdc, sector 5629312000
[2223145.097918] XFS (sdc1): metadata I/O error: block 0x14f887400 ("xfs_trans_read_buf_map") error 61 numblks 16
[2223145.149124] XFS (sdc1): xfs_imap_to_bp: xfs_trans_read_buf() returned error -61.
[2223145.164223] XFS (sdc1): metadata I/O error: block 0x14f887400 ("xfs_trans_read_buf_map") error 61 numblks 16
[2223145.220588] XFS (sdc1): xfs_imap_to_bp: xfs_trans_read_buf() returned error -61.
[2223268.252647] XFS (sdc1): Unmounting Filesystem

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptApr 24 2017, 1:40 PM
fgiunchedi updated the task description. (Show Details)
fgiunchedi added a subscriber: fgiunchedi.

This is one of the new machines in this batch, I tried burning-in the disks before production but clearly it wasn't enough :(
Note that the disk is fine according to hpssacli

Mentioned in SAL (#wikimedia-operations) [2017-04-27T08:29:34Z] <godog> ms-be1039 issue "controller slot=3 pd 1I:1:5 modify disablepd" to force failed sdc - T163690

fgiunchedi moved this task from Backlog to Blocked on the User-fgiunchedi board.Apr 27 2017, 2:17 PM

A case has been opened with HP

Your case was successfully submitted. Please note your Case ID: 5319274490 for future reference.

@fgiunchedi the disk has been replaced with a new one. You will probably need to add back.

Return shipping information
UPS 1Z A73 27E 90 8316 9978

@Cmjohnson thanks! disk has been rebuilt

Cmjohnson closed this task as Resolved.May 4 2017, 6:14 PM

Resolved...the old disk has been dropped off for return