The disk sdk failed on ms-be1043 earlier today, I began investigating why an automatic task wasn't opened and it looks like the PD isn't there at all (see below) nor is the LD.
I believe this a new failure mode, at least I don't remember seeing something like this before. cc @Volans
Mar 18 01:49:08 ms-be1043 kernel: [12660732.334403] sd 0:2:10:0: [sdk] tag#0 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK Mar 18 01:49:08 ms-be1043 kernel: [12660732.334425] sd 0:2:10:0: [sdk] tag#0 CDB: Read(16) 88 00 00 00 00 01 b0 56 7c a0 00 00 00 08 00 00 Mar 18 01:49:08 ms-be1043 kernel: [12660732.334432] blk_update_request: I/O error, dev sdk, sector 7253425312 Mar 18 01:49:08 ms-be1043 kernel: [12660732.341990] XFS (sdk1): metadata I/O error: block 0x1b05674a0 ("xfs_trans_read_buf_map") error 5 numblks 8 Mar 18 01:49:09 ms-be1043 kernel: [12660733.450147] sd 0:2:10:0: [sdk] tag#56 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK Mar 18 01:49:09 ms-be1043 kernel: [12660733.450154] sd 0:2:10:0: [sdk] tag#56 CDB: Read(16) 88 00 00 00 00 01 37 ea 7a c0 00 00 00 08 00 00 Mar 18 01:49:09 ms-be1043 kernel: [12660733.450158] blk_update_request: I/O error, dev sdk, sector 5233081024 Mar 18 01:49:09 ms-be1043 kernel: [12660733.450171] sd 0:2:10:0: [sdk] tag#53 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK Mar 18 01:49:09 ms-be1043 kernel: [12660733.450176] sd 0:2:10:0: [sdk] tag#53 CDB: Read(16) 88 00 00 00 00 01 00 b0 e8 40 00 00 00 20 00 00 Mar 18 01:49:09 ms-be1043 kernel: [12660733.450179] blk_update_request: I/O error, dev sdk, sector 4306561088 Mar 18 01:49:09 ms-be1043 kernel: [12660733.450189] sd 0:2:10:0: [sdk] tag#51 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK Mar 18 01:49:09 ms-be1043 kernel: [12660733.450192] sd 0:2:10:0: [sdk] tag#47 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK Mar 18 01:49:09 ms-be1043 kernel: [12660733.450195] sd 0:2:10:0: [sdk] tag#46 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK Mar 18 01:49:09 ms-be1043 kernel: [12660733.450198] sd 0:2:10:0: [sdk] tag#43 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK Mar 18 01:49:09 ms-be1043 kernel: [12660733.450200] sd 0:2:10:0: [sdk] tag#44 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK Mar 18 01:49:09 ms-be1043 kernel: [12660733.450203] sd 0:2:10:0: [sdk] tag#51 CDB: Read(16) 88 00 00 00 00 01 72 0d 09 a0 00 00 00 20 00 00 Mar 18 01:49:09 ms-be1043 kernel: [12660733.450204] sd 0:2:10:0: [sdk] tag#47 CDB: Read(16) 88 00 00 00 00 01 7d f3 9e e0 00 00 00 08 00 00 Mar 18 01:49:09 ms-be1043 kernel: [12660733.450207] sd 0:2:10:0: [sdk] tag#46 CDB: Read(16) 88 00 00 00 00 01 01 96 40 20 00 00 00 20 00 00 Mar 18 01:49:09 ms-be1043 kernel: [12660733.450209] sd 0:2:10:0: [sdk] tag#43 CDB: Read(16) 88 00 00 00 00 00 9a 24 16 a0 00 00 00 20 00 00 Mar 18 01:49:09 ms-be1043 kernel: [12660733.450212] blk_update_request: I/O error, dev sdk, sector 6208424352 Mar 18 01:49:09 ms-be1043 kernel: [12660733.450214] sd 0:2:10:0: [sdk] tag#44 CDB: Read(16) 88 00 00 00 00 00 c8 f4 4f c0 00 00 00 20 00 00 Mar 18 01:49:09 ms-be1043 kernel: [12660733.450218] blk_update_request: I/O error, dev sdk, sector 6408085216 Mar 18 01:49:09 ms-be1043 kernel: [12660733.450222] blk_update_request: I/O error, dev sdk, sector 4321591328 Mar 18 01:49:09 ms-be1043 kernel: [12660733.450225] blk_update_request: I/O error, dev sdk, sector 2586056352 Mar 18 01:49:09 ms-be1043 kernel: [12660733.450227] blk_update_request: I/O error, dev sdk, sector 3371454400 Mar 18 01:49:09 ms-be1043 kernel: [12660733.450273] XFS (sdk1): metadata I/O error: block 0x100b0e040 ("xfs_trans_read_buf_map") error 5 numblks 32 Mar 18 01:49:09 ms-be1043 kernel: [12660733.450279] XFS (sdk1): metadata I/O error: block 0x17df396e0 ("xfs_trans_read_buf_map") error 5 numblks 8 Mar 18 01:49:09 ms-be1043 kernel: [12660733.450285] XFS (sdk1): metadata I/O error: block 0xc8f447c0 ("xfs_trans_read_buf_map") error 5 numblks 32 Mar 18 01:49:09 ms-be1043 kernel: [12660733.450290] XFS (sdk1): metadata I/O error: block 0x101963820 ("xfs_trans_read_buf_map") error 5 numblks 32 Mar 18 01:49:09 ms-be1043 kernel: [12660733.450294] XFS (sdk1): metadata I/O error: block 0x9a240ea0 ("xfs_trans_read_buf_map") error 5 numblks 32 Mar 18 01:49:09 ms-be1043 kernel: [12660733.450298] XFS (sdk1): xfs_imap_to_bp: xfs_trans_read_buf() returned error -5. Mar 18 01:49:09 ms-be1043 kernel: [12660733.450303] XFS (sdk1): metadata I/O error: block 0x1720d01a0 ("xfs_trans_read_buf_map") error 5 numblks 32 Mar 18 01:49:09 ms-be1043 kernel: [12660733.450306] XFS (sdk1): xfs_imap_to_bp: xfs_trans_read_buf() returned error -5. Mar 18 01:49:09 ms-be1043 kernel: [12660733.450309] XFS (sdk1): xfs_imap_to_bp: xfs_trans_read_buf() returned error -5. Mar 18 01:49:09 ms-be1043 kernel: [12660733.450312] XFS (sdk1): xfs_imap_to_bp: xfs_trans_read_buf() returned error -5. Mar 18 01:49:09 ms-be1043 kernel: [12660733.450315] XFS (sdk1): xfs_imap_to_bp: xfs_trans_read_buf() returned error -5. Mar 18 01:49:09 ms-be1043 kernel: [12660733.451937] sd 0:2:10:0: [sdk] tag#68 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK Mar 18 01:49:09 ms-be1043 kernel: [12660733.451941] sd 0:2:10:0: [sdk] tag#68 CDB: Read(16) 88 00 00 00 00 00 bf ab d3 e0 00 00 00 20 00 00 Mar 18 01:49:09 ms-be1043 kernel: [12660733.451944] blk_update_request: I/O error, dev sdk, sector 3215709152 Mar 18 01:49:09 ms-be1043 kernel: [12660733.452007] XFS (sdk1): metadata I/O error: block 0xbfabcbe0 ("xfs_trans_read_buf_map") error 5 numblks 32 Mar 18 01:49:09 ms-be1043 kernel: [12660733.452014] XFS (sdk1): xfs_imap_to_bp: xfs_trans_read_buf() returned error -5. Mar 18 01:49:09 ms-be1043 kernel: [12660733.452939] megaraid_sas 0000:02:00.0: scanning for scsi0... Mar 18 01:49:09 ms-be1043 kernel: [12660733.453297] megaraid_sas 0000:02:00.0: 1450 (606188930s/0x0021/FATAL) - Controller cache pinned for missing or offline VD 0a/a Mar 18 01:49:09 ms-be1043 kernel: [12660733.453510] megaraid_sas 0000:02:00.0: 1451 (606188930s/0x0001/FATAL) - VD 0a/a is now OFFLINE Mar 18 01:49:09 ms-be1043 kernel: [12660733.453682] sd 0:2:10:0: [sdk] tag#191 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK Mar 18 01:49:09 ms-be1043 kernel: [12660733.453685] blk_update_request: I/O error, dev sdk, sector 5301335056 Mar 18 01:49:09 ms-be1043 kernel: [12660733.453687] sd 0:2:10:0: [sdk] tag#191 CDB: Read(16) 88 00 00 00 00 01 35 30 f9 e0 00 00 00 20 00 00 Mar 18 01:49:09 ms-be1043 kernel: [12660733.453742] XFS (sdk1): metadata I/O error: block 0x13530f1e0 ("xfs_trans_read_buf_map") error 5 numblks 32 Mar 18 01:49:09 ms-be1043 kernel: [12660733.453749] XFS (sdk1): xfs_imap_to_bp: xfs_trans_read_buf() returned error -5. Mar 18 01:49:09 ms-be1043 kernel: [12660733.453757] XFS (sdk1): metadata I/O error: block 0x13bfbec10 ("xfs_trans_read_buf_map") error 5 numblks 8 Mar 18 01:49:09 ms-be1043 kernel: [12660733.453955] XFS (sdk1): metadata I/O error: block 0xe8ea413c ("xlog_iodone") error 5 numblks 64 Mar 18 01:49:09 ms-be1043 kernel: [12660733.453960] XFS (sdk1): xfs_do_force_shutdown(0x2) called from line 1233 of file /build/linux-IWeKxA/linux-4.9.110/fs/xfs/xfs_log.c. Return address = 0xffffffffc0aca882 Mar 18 01:49:09 ms-be1043 kernel: [12660733.454107] XFS (sdk1): Log I/O Error Detected. Shutting down filesystem Mar 18 01:49:09 ms-be1043 kernel: [12660733.454108] XFS (sdk1): Please umount the filesystem and rectify the problem(s) Mar 18 01:49:19 ms-be1043 kernel: [12660743.342773] XFS (sdk1): Unmounting Filesystem Mar 18 01:53:09 ms-be1043 kernel: [12660973.025021] megaraid_sas 0000:02:00.0: 1470 (606189171s/0x0004/CRIT) - Enclosure PD 20(c None/p1) phy bad for slot 8
root@ms-be1043:~# megacli -PDList -aALL | grep -e 'Slot Number' -e state: Slot Number: 0 Firmware state: Online, Spun Up Slot Number: 1 Firmware state: Online, Spun Up Slot Number: 2 Firmware state: Online, Spun Up Slot Number: 3 Firmware state: Online, Spun Up Slot Number: 4 Firmware state: Online, Spun Up Slot Number: 5 Firmware state: Online, Spun Up Slot Number: 6 Firmware state: Online, Spun Up Slot Number: 7 Firmware state: Online, Spun Up Slot Number: 9 Firmware state: Online, Spun Up Slot Number: 10 Firmware state: Online, Spun Up Slot Number: 11 Firmware state: Online, Spun Up Slot Number: 12 Firmware state: Online, Spun Up Slot Number: 13 Firmware state: Online, Spun Up root@ms-be1043:~# /usr/local/lib/nagios/plugins/ check_dpkg check_ipmi_sensor check_newest_file_age check_raid check_systemd_unit_state check_eth check_long_procs check_puppetrun check_systemd_state get-raid-status-megacli root@ms-be1043:~# /usr/local/lib/nagios/plugins/get-raid-status-megacli === RaidStatus (does not include components in optimal state) === RaidStatus completed root@ms-be1043:~# grep sdk /proc/partitions root@ms-be1043:~#
root@ms-be1043:~# megacli -AdpAllInfo -aALL ... Device Present ================ Virtual Drives : 13 Degraded : 0 Offline : 0 Physical Devices : 14 Disks : 13 Critical Disks : 0 Failed Disks : 0