Page MenuHomePhabricator

Disk (sda) failed in ms-be2072
Closed, ResolvedPublic

Description

Hi,

/dev/sda has failed in this system; can it be swapped out ASAP, please? You can perform this operation at your earliest convenience.

=== RaidStatus (does not include components in optimal state)
name: Adapter #0

        Enclosure Device ID: 32
        Slot Number: 0
        Enclosure position: 2
        Device Id: 0
        Media Error Count: 120
        Other Error Count: 22
        Predictive Failure Count: =====> 1 <=====
        Last Predictive Failure Event Seq Number: 4872

                Raw Size: 7.277 TB [0x3a3812ab0 Sectors]
                Firmware state: JBOD
                Media Type: Hard Disk Device
                Drive Temperature: 36C (96.80 F)

=== RaidStatus completed

It's marked as pre-failed there, but in fact the filesystem has failed due to I/O errors a couple of times (I tried a reboot to clear it, to no avail); recent kern.log:

Jan 18 14:05:23 ms-be2072 kernel: [12994.811335] sd 0:0:0:0: [sda] tag#456 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=6s
Jan 18 14:05:23 ms-be2072 kernel: [12994.811359] sd 0:0:0:0: [sda] tag#456 Sense
 Key : Medium Error [current] 
Jan 18 14:05:23 ms-be2072 kernel: [12994.811368] sd 0:0:0:0: [sda] tag#456 Add. Sense: Unrecovered read error
Jan 18 14:05:23 ms-be2072 kernel: [12994.811379] sd 0:0:0:0: [sda] tag#456 CDB: Read(16) 88 00 00 00 00 00 01 89 aa 60 00 00 00 20 00 00
Jan 18 14:05:23 ms-be2072 kernel: [12994.811393] blk_update_request: critical medium error, dev sda, sector 25799288 op 0x0:(READ) flags 0x1000 phys_seg 1 prio class 0
Jan 18 14:05:23 ms-be2072 kernel: [12994.823470] XFS (sda1): metadata I/O error in "xfs_imap_to_bp+0x61/0xb0 [xfs]" at daddr 0x189a260 len 32 error 61
Jan 18 14:05:24 ms-be2072 kernel: [12995.883759] XFS (sda1): xfs_do_force_shutdown(0x1) called from line 296 of file fs/xfs/xfs_trans_buf.c. Return address = 00000000989b50ab
Jan 18 14:05:24 ms-be2072 kernel: [12995.883763] XFS (sda1): I/O Error Detected. Shutting down filesystem
Jan 18 14:05:24 ms-be2072 kernel: [12995.890152] XFS (sda1): Please unmount the filesystem and rectify the problem(s)
Jan 18 14:42:59 ms-be2072 kernel: [13003.209238] sda1: writeback error on inode 6459832606, offset 33554432, sector 9922932512
Jan 18 14:42:59 ms-be2072 kernel: [15250.568548] XFS (sda1): Unmounting Filesystem

Event Timeline

Jhancock.wm subscribed.

@MatthewVernon replaced the drive from stock.

leaving ticket open until we get replacement drive from Dell.

please @ me or papaul if any new errors occur.

Mentioned in SAL (#wikimedia-operations) [2024-01-19T16:31:40Z] <Emperor> mark new drive as non-RAID, mount, restore to service with puppet ms-be2072 T355330

replacement disk arrived, broken one shipped back. new one put back into stock