cp3039 didn't come back up online after reboot (kernel and varnish upgrades). The management interface at cp3039.mgmt.esams.wmnet responds to ping but I cannot connect via SSH.
Description
Event Timeline
The host just came back up online now, and I can ssh into the management interface. Leaving the ticket open though as it took a very long time for both things to happen.
18:40 < icinga-wm> PROBLEM - Host cp3039 is DOWN: PING CRITICAL - Packet loss = 100% [...] 18:53 < icinga-wm> RECOVERY - Host cp3039 is UP: PING OK - Packet loss = 0%, RTA = 83.74 ms
@MoritzMuehlenhoff suggested on IRC that this issue could be due to some kind of hardware self-test, so perhaps it could be a good idea to take a deeper look into cp3039 hw-wise?
Mentioned in SAL (#wikimedia-operations) [2016-11-17T00:38:19Z] <bblack> depooling cp3039 for hw/bios work - T150879
Mentioned in SAL (#wikimedia-operations) [2016-11-17T00:48:29Z] <bblack> repool cp3039 - T150879
The dmesg from the very-delayed bootup is interesting. Normally we get through the bulk of the bootup messages in under 20s, but the process took closer to 3-4 minutes in this case. There are a bunch of quite unique things happening in dmesg related to USB devices, and I think they're virtual USB devices related to iDRAC/firmware somehow: virtual consoles, virtual disks that don't work, etc. Some snippets that we don't normally see:
This was the bulk of the dmesg-level delay, dealing with a bunch of virtual disks that shouldn't be there and don't work right, starting shortly after the usual detection of /dev/sda and /dev/sdb (our 2x actual SSDs, the only disks that should exist):
Nov 16 17:53:28 cp3039 kernel: [ 15.661146] usb 1-1.6: New USB device found, idVendor=413c, idProduct=a001 Nov 16 17:53:28 cp3039 kernel: [ 15.661149] usb 1-1.6: New USB device strings: Mfr=1, Product=2, SerialNumber=3 Nov 16 17:53:28 cp3039 kernel: [ 15.661151] usb 1-1.6: Product: Gadget USB HUB Nov 16 17:53:28 cp3039 kernel: [ 15.661152] usb 1-1.6: Manufacturer: no manufacturer Nov 16 17:53:28 cp3039 kernel: [ 15.661154] usb 1-1.6: SerialNumber: 0123456789 Nov 16 17:53:28 cp3039 kernel: [ 15.661579] hub 1-1.6:1.0: USB hub found Nov 16 17:53:28 cp3039 kernel: [ 15.661766] hub 1-1.6:1.0: 6 ports detected Nov 16 17:53:28 cp3039 kernel: [ 15.920275] ata3: SATA link down (SStatus 0 SControl 300) Nov 16 17:53:28 cp3039 kernel: [ 16.108273] usb 1-1.6.2: new high-speed USB device number 4 using ehci-pci Nov 16 17:53:28 cp3039 kernel: [ 16.233874] usb 1-1.6.2: New USB device found, idVendor=0624, idProduct=0250 Nov 16 17:53:28 cp3039 kernel: [ 16.241755] usb 1-1.6.2: New USB device strings: Mfr=4, Product=5, SerialNumber=6 Nov 16 17:53:28 cp3039 kernel: [ 16.244289] ata4: SATA link down (SStatus 0 SControl 300) Nov 16 17:53:28 cp3039 kernel: [ 16.256145] usb 1-1.6.2: Product: Mass Storage Function Nov 16 17:53:28 cp3039 kernel: [ 16.261987] usb 1-1.6.2: Manufacturer: Avocent Nov 16 17:53:28 cp3039 kernel: [ 16.266946] usb 1-1.6.2: SerialNumber: 20120731 Nov 16 17:53:28 cp3039 kernel: [ 16.298703] usb-storage 1-1.6.2:1.0: USB Mass Storage device detected Nov 16 17:53:28 cp3039 kernel: [ 16.306121] scsi host10: usb-storage 1-1.6.2:1.0 Nov 16 17:53:28 cp3039 kernel: [ 16.311376] usbcore: registered new interface driver usb-storage Nov 16 17:53:28 cp3039 kernel: [ 16.318727] usbcore: registered new interface driver uas Nov 16 17:53:28 cp3039 kernel: [ 16.544252] usb 1-1.6.6: new high-speed USB device number 5 using ehci-pci Nov 16 17:53:28 cp3039 kernel: [ 16.564272] ata6: SATA link down (SStatus 0 SControl 300) Nov 16 17:53:28 cp3039 kernel: [ 16.665205] usb 1-1.6.6: New USB device found, idVendor=0624, idProduct=0253 Nov 16 17:53:28 cp3039 kernel: [ 16.673078] usb 1-1.6.6: New USB device strings: Mfr=4, Product=5, SerialNumber=6 Nov 16 17:53:28 cp3039 kernel: [ 16.681437] usb 1-1.6.6: Product: Mass Storage Function Nov 16 17:53:28 cp3039 kernel: [ 16.687271] usb 1-1.6.6: Manufacturer: Avocent Nov 16 17:53:28 cp3039 kernel: [ 16.692233] usb 1-1.6.6: SerialNumber: 20120731-3 Nov 16 17:53:28 cp3039 kernel: [ 16.698995] usb-storage 1-1.6.6:1.0: USB Mass Storage device detected Nov 16 17:53:28 cp3039 kernel: [ 16.706315] scsi host11: usb-storage 1-1.6.6:1.0 Nov 16 17:53:28 cp3039 kernel: [ 16.888313] ata7: SATA link down (SStatus 0 SControl 300) Nov 16 17:53:28 cp3039 kernel: [ 17.212296] ata8: SATA link down (SStatus 0 SControl 300) Nov 16 17:53:28 cp3039 kernel: [ 17.315436] scsi 10:0:0:0: Direct-Access iDRAC MAS001 0329 PQ: 0 ANSI: 0 CCS Nov 16 17:53:28 cp3039 kernel: [ 17.326072] scsi 10:0:0:1: Direct-Access iDRAC MAS002 0329 PQ: 0 ANSI: 0 CCS Nov 16 17:53:28 cp3039 kernel: [ 17.396054] scsi 10:0:0:2: Direct-Access iDRAC MAS021 0329 PQ: 0 ANSI: 0 CCS Nov 16 17:53:28 cp3039 kernel: [ 17.406553] scsi 10:0:0:3: Direct-Access iDRAC MAS022 0329 PQ: 0 ANSI: 0 CCS Nov 16 17:53:28 cp3039 kernel: [ 17.473185] scsi 10:0:0:4: Direct-Access iDRAC MAS024 0329 PQ: 0 ANSI: 0 CCS Nov 16 17:53:28 cp3039 kernel: [ 17.483742] sd 10:0:0:0: Attached scsi generic sg2 type 0 Nov 16 17:53:28 cp3039 kernel: [ 17.485671] sd 10:0:0:0: [sdc] 102400 512-byte logical blocks: (52.4 MB/50.0 MiB) Nov 16 17:53:28 cp3039 kernel: [ 17.486665] sd 10:0:0:0: [sdc] Write Protect is on Nov 16 17:53:28 cp3039 kernel: [ 17.486669] sd 10:0:0:0: [sdc] Mode Sense: 23 00 80 00 Nov 16 17:53:28 cp3039 kernel: [ 17.503892] sd 10:0:0:1: Attached scsi generic sg3 type 0 Nov 16 17:53:28 cp3039 kernel: [ 17.510289] sd 10:0:0:2: Attached scsi generic sg4 type 0 Nov 16 17:53:28 cp3039 kernel: [ 17.516607] sd 10:0:0:3: Attached scsi generic sg5 type 0 Nov 16 17:53:28 cp3039 kernel: [ 17.522947] sd 10:0:0:4: Attached scsi generic sg6 type 0 Nov 16 17:53:28 cp3039 kernel: [ 17.536368] ata9: SATA link down (SStatus 0 SControl 300) Nov 16 17:53:28 cp3039 kernel: [ 17.631311] sd 10:0:0:0: [sdc] No Caching mode page found Nov 16 17:53:28 cp3039 kernel: [ 17.636397] sd 10:0:0:1: [sdd] 30720 512-byte logical blocks: (15.7 MB/15.0 MiB) Nov 16 17:53:28 cp3039 kernel: [ 17.636784] sd 10:0:0:2: [sde] 614400 512-byte logical blocks: (315 MB/300 MiB) Nov 16 17:53:28 cp3039 kernel: [ 17.637782] sd 10:0:0:3: [sdf] 512000 512-byte logical blocks: (262 MB/250 MiB) Nov 16 17:53:28 cp3039 kernel: [ 17.638174] sd 10:0:0:4: [sdg] 20480 512-byte logical blocks: (10.5 MB/10.0 MiB) Nov 16 17:53:28 cp3039 kernel: [ 17.639157] sd 10:0:0:1: [sdd] Write Protect is on Nov 16 17:53:28 cp3039 kernel: [ 17.639160] sd 10:0:0:1: [sdd] Mode Sense: 23 00 80 00 Nov 16 17:53:28 cp3039 kernel: [ 17.640396] sd 10:0:0:2: [sde] Write Protect is on Nov 16 17:53:28 cp3039 kernel: [ 17.640399] sd 10:0:0:2: [sde] Mode Sense: 23 00 80 00 Nov 16 17:53:28 cp3039 kernel: [ 17.641405] sd 10:0:0:3: [sdf] Write Protect is off Nov 16 17:53:28 cp3039 kernel: [ 17.641408] sd 10:0:0:3: [sdf] Mode Sense: 23 00 00 00 Nov 16 17:53:28 cp3039 kernel: [ 17.686347] sd 10:0:0:0: [sdc] Assuming drive cache: write through Nov 16 17:53:28 cp3039 kernel: [ 17.709909] scsi 11:0:0:0: Direct-Access iDRAC MAS025 0329 PQ: 0 ANSI: 0 CCS Nov 16 17:53:28 cp3039 kernel: [ 17.720355] sd 11:0:0:0: Attached scsi generic sg7 type 0 Nov 16 17:53:28 cp3039 kernel: [ 17.722303] sd 11:0:0:0: [sdh] 122880 512-byte logical blocks: (62.9 MB/60.0 MiB) Nov 16 17:53:28 cp3039 kernel: [ 17.723296] sd 11:0:0:0: [sdh] Write Protect is off Nov 16 17:53:28 cp3039 kernel: [ 17.723298] sd 11:0:0:0: [sdh] Mode Sense: 23 00 00 00 Nov 16 17:53:28 cp3039 kernel: [ 17.724798] sd 11:0:0:0: [sdh] No Caching mode page found Nov 16 17:53:28 cp3039 kernel: [ 17.724800] sd 11:0:0:0: [sdh] Assuming drive cache: write through Nov 16 17:53:28 cp3039 kernel: [ 17.780108] sd 10:0:0:4: [sdg] Write Protect is on Nov 16 17:53:28 cp3039 kernel: [ 17.785477] sd 10:0:0:4: [sdg] Mode Sense: 23 00 80 00 Nov 16 17:53:28 cp3039 kernel: [ 17.860385] ata10: SATA link down (SStatus 0 SControl 300) Nov 16 17:53:28 cp3039 kernel: [ 17.935619] sd 10:0:0:1: [sdd] No Caching mode page found Nov 16 17:53:28 cp3039 kernel: [ 17.937195] sd 10:0:0:2: [sde] No Caching mode page found Nov 16 17:53:28 cp3039 kernel: [ 17.937197] sd 10:0:0:2: [sde] Assuming drive cache: write through Nov 16 17:53:28 cp3039 kernel: [ 17.954592] sd 10:0:0:1: [sdd] Assuming drive cache: write through Nov 16 17:53:28 cp3039 kernel: [ 18.094837] sd 10:0:0:3: [sdf] No Caching mode page found Nov 16 17:53:28 cp3039 kernel: [ 18.096203] sd 10:0:0:4: [sdg] No Caching mode page found Nov 16 17:53:28 cp3039 kernel: [ 18.096206] sd 10:0:0:4: [sdg] Assuming drive cache: write through Nov 16 17:53:28 cp3039 kernel: [ 18.113809] sd 10:0:0:3: [sdf] Assuming drive cache: write through Nov 16 17:53:28 cp3039 kernel: [ 27.947448] sd 11:0:0:0: [sdh] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE Nov 16 17:53:28 cp3039 kernel: [ 27.956780] sd 11:0:0:0: [sdh] tag#0 Sense Key : Medium Error [current] Nov 16 17:53:28 cp3039 kernel: [ 27.964262] sd 11:0:0:0: [sdh] tag#0 Add. Sense: Unrecovered read error Nov 16 17:53:28 cp3039 kernel: [ 27.971639] sd 11:0:0:0: [sdh] tag#0 CDB: Read(10) 28 00 00 00 00 00 00 00 08 00 Nov 16 17:53:28 cp3039 kernel: [ 27.979899] blk_update_request: critical medium error, dev sdh, sector 0 Nov 16 17:53:28 cp3039 kernel: [ 27.987381] Buffer I/O error on dev sdh, logical block 0, async page read Nov 16 17:53:28 cp3039 kernel: [ 28.827452] sd 10:0:0:0: [sdc] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE Nov 16 17:53:28 cp3039 kernel: [ 28.836779] sd 10:0:0:0: [sdc] tag#0 Sense Key : Medium Error [current] Nov 16 17:53:28 cp3039 kernel: [ 28.844263] sd 10:0:0:0: [sdc] tag#0 Add. Sense: Unrecovered read error Nov 16 17:53:28 cp3039 kernel: [ 28.851649] sd 10:0:0:0: [sdc] tag#0 CDB: Read(10) 28 00 00 00 00 00 00 00 08 00 Nov 16 17:53:28 cp3039 kernel: [ 28.859907] blk_update_request: critical medium error, dev sdc, sector 0 Nov 16 17:53:28 cp3039 kernel: [ 28.867386] Buffer I/O error on dev sdc, logical block 0, async page read Nov 16 17:53:28 cp3039 kernel: [ 38.028288] sd 11:0:0:0: [sdh] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE Nov 16 17:53:28 cp3039 kernel: [ 38.037614] sd 11:0:0:0: [sdh] tag#0 Sense Key : Medium Error [current] Nov 16 17:53:28 cp3039 kernel: [ 38.045095] sd 11:0:0:0: [sdh] tag#0 Add. Sense: Unrecovered read error Nov 16 17:53:28 cp3039 kernel: [ 38.052477] sd 11:0:0:0: [sdh] tag#0 CDB: Read(10) 28 00 00 00 00 00 00 00 08 00 Nov 16 17:53:28 cp3039 kernel: [ 38.060732] blk_update_request: critical medium error, dev sdh, sector 0 Nov 16 17:53:28 cp3039 kernel: [ 38.068212] Buffer I/O error on dev sdh, logical block 0, async page read Nov 16 17:53:28 cp3039 kernel: [ 38.956443] sd 10:0:0:2: [sde] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE Nov 16 17:53:28 cp3039 kernel: [ 38.965770] sd 10:0:0:2: [sde] tag#0 Sense Key : Medium Error [current] Nov 16 17:53:28 cp3039 kernel: [ 38.973241] sd 10:0:0:2: [sde] tag#0 Add. Sense: Unrecovered read error Nov 16 17:53:28 cp3039 kernel: [ 38.980624] sd 10:0:0:2: [sde] tag#0 CDB: Read(10) 28 00 00 00 00 00 00 00 08 00 Nov 16 17:53:28 cp3039 kernel: [ 38.988878] blk_update_request: critical medium error, dev sde, sector 0 Nov 16 17:53:28 cp3039 kernel: [ 38.996360] Buffer I/O error on dev sde, logical block 0, async page read Nov 16 17:53:28 cp3039 kernel: [ 48.110590] sd 11:0:0:0: [sdh] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE Nov 16 17:53:28 cp3039 kernel: [ 48.119906] sd 11:0:0:0: [sdh] tag#0 Sense Key : Medium Error [current] Nov 16 17:53:28 cp3039 kernel: [ 48.127387] sd 11:0:0:0: [sdh] tag#0 Add. Sense: Unrecovered read error Nov 16 17:53:28 cp3039 kernel: [ 48.134769] sd 11:0:0:0: [sdh] tag#0 CDB: Read(10) 28 00 00 00 00 00 00 00 08 00 Nov 16 17:53:28 cp3039 kernel: [ 48.143025] blk_update_request: critical medium error, dev sdh, sector 0 Nov 16 17:53:28 cp3039 kernel: [ 48.150504] Buffer I/O error on dev sdh, logical block 0, async page read Nov 16 17:53:28 cp3039 kernel: [ 49.076143] sd 10:0:0:1: [sdd] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE Nov 16 17:53:28 cp3039 kernel: [ 49.085476] sd 10:0:0:1: [sdd] tag#0 Sense Key : Medium Error [current] Nov 16 17:53:28 cp3039 kernel: [ 49.092978] sd 10:0:0:1: [sdd] tag#0 Add. Sense: Unrecovered read error Nov 16 17:53:28 cp3039 kernel: [ 49.100364] sd 10:0:0:1: [sdd] tag#0 CDB: Read(10) 28 00 00 00 00 00 00 00 08 00 Nov 16 17:53:28 cp3039 kernel: [ 49.108624] blk_update_request: critical medium error, dev sdd, sector 0 Nov 16 17:53:28 cp3039 kernel: [ 49.116104] Buffer I/O error on dev sdd, logical block 0, async page read Nov 16 17:53:28 cp3039 kernel: [ 57.829342] ldm_validate_partition_table(): Disk read failed. Nov 16 17:53:28 cp3039 kernel: [ 57.835807] Dev sdh: unable to read RDB block 0 Nov 16 17:53:28 cp3039 kernel: [ 57.838640] sd 10:0:0:0: [sdc] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE Nov 16 17:53:28 cp3039 kernel: [ 57.838644] sd 10:0:0:0: [sdc] tag#0 Sense Key : Medium Error [current] Nov 16 17:53:28 cp3039 kernel: [ 57.838648] sd 10:0:0:0: [sdc] tag#0 Add. Sense: Unrecovered read error Nov 16 17:53:28 cp3039 kernel: [ 57.838653] sd 10:0:0:0: [sdc] tag#0 CDB: Read(10) 28 00 00 00 00 00 00 00 08 00 Nov 16 17:53:28 cp3039 kernel: [ 57.838655] blk_update_request: critical medium error, dev sdc, sector 0 Nov 16 17:53:28 cp3039 kernel: [ 57.838658] Buffer I/O error on dev sdc, logical block 0, async page read Nov 16 17:53:28 cp3039 kernel: [ 57.868224] ldm_validate_partition_table(): Disk read failed. Nov 16 17:53:28 cp3039 kernel: [ 57.868232] Dev sdd: unable to read RDB block 0 Nov 16 17:53:28 cp3039 kernel: [ 57.914371] sdg: sdg1 Nov 16 17:53:28 cp3039 kernel: [ 57.923623] ldm_validate_partition_table(): Disk read failed. Nov 16 17:53:28 cp3039 kernel: [ 57.925740] ldm_validate_partition_table(): Disk read failed. Nov 16 17:53:28 cp3039 kernel: [ 57.925747] Dev sdc: unable to read RDB block 0 Nov 16 17:53:28 cp3039 kernel: [ 57.941525] Dev sde: unable to read RDB block 0 Nov 16 17:53:28 cp3039 kernel: [ 58.062436] random: nonblocking pool is initialized Nov 16 17:53:28 cp3039 kernel: [ 58.063084] sdd: unable to read partition table Nov 16 17:53:28 cp3039 kernel: [ 58.073044] sdh: unable to read partition table Nov 16 17:53:28 cp3039 kernel: [ 58.265838] sdc: unable to read partition table Nov 16 17:53:28 cp3039 kernel: [ 58.290827] sde: unable to read partition table Nov 16 17:53:28 cp3039 kernel: [ 58.299697] sd 11:0:0:0: [sdh] Attached SCSI removable disk Nov 16 17:53:28 cp3039 kernel: [ 58.401997] sdf: sdf1 Nov 16 17:53:28 cp3039 kernel: [ 58.543337] sd 10:0:0:0: [sdc] Attached SCSI removable disk Nov 16 17:53:28 cp3039 kernel: [ 58.813394] sd 10:0:0:2: [sde] Attached SCSI removable disk Nov 16 17:53:28 cp3039 kernel: [ 59.189754] sd 10:0:0:1: [sdd] Attached SCSI removable disk Nov 16 17:53:28 cp3039 kernel: [ 59.191300] sd 10:0:0:4: [sdg] Attached SCSI removable disk Nov 16 17:53:28 cp3039 kernel: [ 59.247010] sd 10:0:0:3: [sdf] Attached SCSI removable disk Nov 16 17:53:28 cp3039 kernel: [ 78.122949] sd 10:0:0:1: [sdd] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE Nov 16 17:53:28 cp3039 kernel: [ 78.132285] sd 10:0:0:1: [sdd] tag#0 Sense Key : Medium Error [current] Nov 16 17:53:28 cp3039 kernel: [ 78.139774] sd 10:0:0:1: [sdd] tag#0 Add. Sense: Unrecovered read error Nov 16 17:53:28 cp3039 kernel: [ 78.147168] sd 10:0:0:1: [sdd] tag#0 CDB: Read(10) 28 00 00 00 01 10 00 00 f0 00 Nov 16 17:53:28 cp3039 kernel: [ 78.155421] blk_update_request: critical medium error, dev sdd, sector 272 Nov 16 17:53:28 cp3039 kernel: [ 88.264944] sd 10:0:0:4: [sdg] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE Nov 16 17:53:28 cp3039 kernel: [ 88.274277] sd 10:0:0:4: [sdg] tag#0 Sense Key : Medium Error [current] Nov 16 17:53:28 cp3039 kernel: [ 88.281763] sd 10:0:0:4: [sdg] tag#0 Add. Sense: Unrecovered read error Nov 16 17:53:28 cp3039 kernel: [ 88.289155] sd 10:0:0:4: [sdg] tag#0 CDB: Read(10) 28 00 00 00 00 80 00 00 90 00 Nov 16 17:53:28 cp3039 kernel: [ 88.297417] blk_update_request: critical medium error, dev sdg, sector 128 Nov 16 17:53:28 cp3039 kernel: [ 98.443031] sd 10:0:0:2: [sde] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE Nov 16 17:53:28 cp3039 kernel: [ 98.452363] sd 10:0:0:2: [sde] tag#0 Sense Key : Medium Error [current] Nov 16 17:53:28 cp3039 kernel: [ 98.459847] sd 10:0:0:2: [sde] tag#0 Add. Sense: Unrecovered read error Nov 16 17:53:28 cp3039 kernel: [ 98.467239] sd 10:0:0:2: [sde] tag#0 CDB: Read(10) 28 00 00 00 00 40 00 00 38 00 Nov 16 17:53:28 cp3039 kernel: [ 98.475502] blk_update_request: critical medium error, dev sde, sector 64 Nov 16 17:53:28 cp3039 kernel: [ 108.574378] sd 10:0:0:3: [sdf] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE Nov 16 17:53:28 cp3039 kernel: [ 108.583712] sd 10:0:0:3: [sdf] tag#0 Sense Key : Medium Error [current] Nov 16 17:53:28 cp3039 kernel: [ 108.591189] sd 10:0:0:3: [sdf] tag#0 Add. Sense: Unrecovered read error Nov 16 17:53:28 cp3039 kernel: [ 108.598581] sd 10:0:0:3: [sdf] tag#0 CDB: Read(10) 28 00 00 00 00 20 00 00 18 00 Nov 16 17:53:28 cp3039 kernel: [ 108.606842] blk_update_request: critical medium error, dev sdf, sector 32 Nov 16 17:53:28 cp3039 kernel: [ 118.675006] sd 10:0:0:0: [sdc] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE Nov 16 17:53:28 cp3039 kernel: [ 118.684342] sd 10:0:0:0: [sdc] tag#0 Sense Key : Medium Error [current] Nov 16 17:53:28 cp3039 kernel: [ 118.691835] sd 10:0:0:0: [sdc] tag#0 Add. Sense: Unrecovered read error Nov 16 17:53:28 cp3039 kernel: [ 118.699230] sd 10:0:0:0: [sdc] tag#0 CDB: Read(10) 28 00 00 00 01 10 00 00 f0 00 Nov 16 17:53:28 cp3039 kernel: [ 118.707491] blk_update_request: critical medium error, dev sdc, sector 272 Nov 16 17:53:28 cp3039 kernel: [ 128.755801] sd 10:0:0:1: [sdd] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE Nov 16 17:53:28 cp3039 kernel: [ 128.765136] sd 10:0:0:1: [sdd] tag#0 Sense Key : Medium Error [current] Nov 16 17:53:28 cp3039 kernel: [ 128.772628] sd 10:0:0:1: [sdd] tag#0 Add. Sense: Unrecovered read error Nov 16 17:53:28 cp3039 kernel: [ 128.780009] sd 10:0:0:1: [sdd] tag#0 CDB: Read(10) 28 00 00 00 01 10 00 00 08 00 Nov 16 17:53:28 cp3039 kernel: [ 128.788269] blk_update_request: critical medium error, dev sdd, sector 272 Nov 16 17:53:28 cp3039 kernel: [ 128.795953] Buffer I/O error on dev sdd, logical block 34, async page read Nov 16 17:53:28 cp3039 kernel: [ 138.876647] sd 10:0:0:4: [sdg] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE Nov 16 17:53:28 cp3039 kernel: [ 138.885985] sd 10:0:0:4: [sdg] tag#0 Sense Key : Medium Error [current] Nov 16 17:53:28 cp3039 kernel: [ 138.893472] sd 10:0:0:4: [sdg] tag#0 Add. Sense: Unrecovered read error Nov 16 17:53:28 cp3039 kernel: [ 138.900868] sd 10:0:0:4: [sdg] tag#0 CDB: Read(10) 28 00 00 00 00 80 00 00 08 00 Nov 16 17:53:28 cp3039 kernel: [ 138.909132] blk_update_request: critical medium error, dev sdg, sector 128 Nov 16 17:53:28 cp3039 kernel: [ 138.916810] Buffer I/O error on dev sdg, logical block 16, async page read Nov 16 17:53:28 cp3039 kernel: [ 145.342643] usb 1-1.6-port6: cannot reset (err = -71) Nov 16 17:53:28 cp3039 kernel: [ 145.352390] usb 1-1.6-port6: cannot reset (err = -71) Nov 16 17:53:28 cp3039 kernel: [ 145.362140] usb 1-1.6-port6: cannot reset (err = -71) Nov 16 17:53:28 cp3039 kernel: [ 145.371889] usb 1-1.6-port6: cannot reset (err = -71) Nov 16 17:53:28 cp3039 kernel: [ 145.381641] usb 1-1.6-port6: cannot reset (err = -71) Nov 16 17:53:28 cp3039 kernel: [ 145.387297] usb 1-1.6-port6: Cannot enable. Maybe the USB cable is bad? Nov 16 17:53:28 cp3039 kernel: [ 145.394905] usb 1-1.6: USB disconnect, device number 3 Nov 16 17:53:28 cp3039 kernel: [ 145.400658] usb 1-1.6.2: USB disconnect, device number 4 Nov 16 17:53:28 cp3039 kernel: [ 145.410029] usb 1-1.6-port6: cannot disable (err = -71) Nov 16 17:53:28 cp3039 kernel: [ 145.416038] sd 10:0:0:1: [sdd] tag#0 FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK Nov 16 17:53:28 cp3039 kernel: [ 145.425856] sd 10:0:0:1: [sdd] tag#0 CDB: Read(10) 28 00 00 00 02 00 00 00 08 00 Nov 16 17:53:28 cp3039 kernel: [ 145.434110] blk_update_request: I/O error, dev sdd, sector 512 Nov 16 17:53:28 cp3039 kernel: [ 145.440645] sd 10:0:0:4: [sdg] tag#0 FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK Nov 16 17:53:28 cp3039 kernel: [ 145.450458] sd 10:0:0:4: [sdg] tag#0 CDB: Read(10) 28 00 00 00 00 80 00 00 08 00 Nov 16 17:53:28 cp3039 kernel: [ 145.458720] blk_update_request: I/O error, dev sdg, sector 128 Nov 16 17:53:28 cp3039 kernel: [ 145.465239] Buffer I/O error on dev sdg, logical block 16, async page read Nov 16 17:53:28 cp3039 kernel: [ 145.472940] sd 10:0:0:2: [sde] tag#0 FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK Nov 16 17:53:28 cp3039 kernel: [ 145.482756] sd 10:0:0:2: [sde] tag#0 CDB: Read(10) 28 00 00 00 00 80 00 00 90 00 Nov 16 17:53:28 cp3039 kernel: [ 145.491021] blk_update_request: I/O error, dev sde, sector 128 Nov 16 17:53:28 cp3039 kernel: [ 145.497554] sd 10:0:0:1: [sdd] tag#0 FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK Nov 16 17:53:28 cp3039 kernel: [ 145.499727] scsi 10:0:0:0: rejecting I/O to offline device Nov 16 17:53:28 cp3039 kernel: [ 145.499730] scsi 10:0:0:0: [sdc] killing request Nov 16 17:53:28 cp3039 kernel: [ 145.499741] scsi 10:0:0:0: [sdc] FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK Nov 16 17:53:28 cp3039 kernel: [ 145.499745] scsi 10:0:0:0: [sdc] CDB: Read(10) 28 00 00 00 01 10 00 00 08 00 Nov 16 17:53:28 cp3039 kernel: [ 145.499747] blk_update_request: I/O error, dev sdc, sector 272 Nov 16 17:53:28 cp3039 kernel: [ 145.499750] Buffer I/O error on dev sdc, logical block 34, async page read Nov 16 17:53:28 cp3039 kernel: [ 145.499805] Buffer I/O error on dev sdc, logical block 2, async page read Nov 16 17:53:28 cp3039 kernel: [ 145.557678] sd 10:0:0:1: [sdd] tag#0 CDB: Read(10) 28 00 00 00 02 00 00 00 08 00 Nov 16 17:53:28 cp3039 kernel: [ 145.565932] blk_update_request: I/O error, dev sdd, sector 512 Nov 16 17:53:28 cp3039 kernel: [ 145.572442] Buffer I/O error on dev sdd, logical block 64, async page read Nov 16 17:53:28 cp3039 kernel: [ 145.580143] sd 10:0:0:2: [sde] tag#0 FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK Nov 16 17:53:28 cp3039 kernel: [ 145.589945] sd 10:0:0:2: [sde] tag#0 CDB: Read(10) 28 00 00 00 00 40 00 00 08 00 Nov 16 17:53:28 cp3039 kernel: [ 145.598202] blk_update_request: I/O error, dev sde, sector 64 Nov 16 17:53:28 cp3039 kernel: [ 145.604619] Buffer I/O error on dev sde, logical block 8, async page read Nov 16 17:53:28 cp3039 kernel: [ 145.612212] sd 10:0:0:3: [sdf] tag#0 FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK Nov 16 17:53:28 cp3039 kernel: [ 145.622025] sd 10:0:0:3: [sdf] tag#0 CDB: Read(10) 28 00 00 00 00 40 00 00 38 00 Nov 16 17:53:28 cp3039 kernel: [ 145.630282] blk_update_request: I/O error, dev sdf, sector 64 Nov 16 17:53:28 cp3039 kernel: [ 145.636719] sd 10:0:0:2: [sde] tag#0 FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK Nov 16 17:53:28 cp3039 kernel: [ 145.646527] sd 10:0:0:2: [sde] tag#0 CDB: Read(10) 28 00 00 00 00 80 00 00 08 00 Nov 16 17:53:28 cp3039 kernel: [ 145.654785] blk_update_request: I/O error, dev sde, sector 128 Nov 16 17:53:28 cp3039 kernel: [ 145.661296] Buffer I/O error on dev sde, logical block 16, async page read Nov 16 17:53:28 cp3039 kernel: [ 145.668979] sd 10:0:0:3: [sdf] tag#0 FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK Nov 16 17:53:28 cp3039 kernel: [ 145.678780] sd 10:0:0:3: [sdf] tag#0 CDB: Read(10) 28 00 00 00 00 80 00 00 90 00 Nov 16 17:53:28 cp3039 kernel: [ 145.687028] blk_update_request: I/O error, dev sdf, sector 128 Nov 16 17:53:28 cp3039 kernel: [ 145.693560] sd 10:0:0:3: [sdf] tag#0 FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK Nov 16 17:53:28 cp3039 kernel: [ 145.703372] sd 10:0:0:3: [sdf] tag#0 CDB: Read(10) 28 00 00 00 00 20 00 00 08 00 Nov 16 17:53:28 cp3039 kernel: [ 145.711628] blk_update_request: I/O error, dev sdf, sector 32 Nov 16 17:53:28 cp3039 kernel: [ 145.718046] Buffer I/O error on dev sdf, logical block 4, async page read Nov 16 17:53:28 cp3039 kernel: [ 145.725716] Buffer I/O error on dev sdf, logical block 16, async page read Nov 16 17:53:28 cp3039 kernel: [ 145.896273] usb 1-1.6.6: USB disconnect, device number 5 Nov 16 17:53:28 cp3039 kernel: [ 150.768995] md: md0 stopped. Nov 16 17:53:28 cp3039 kernel: [ 150.773149] md: bind<sdb1> Nov 16 17:53:28 cp3039 kernel: [ 150.776414] md: bind<sda1> Nov 16 17:53:28 cp3039 kernel: [ 150.781227] md: raid1 personality registered for level 1
Nearer to the end (after more normal stuff) we have some virtual USB keyboard/mouse stuff, and then some virtual ethernet stuff:
Nov 16 17:53:31 cp3039 kernel: [ 167.316994] usb 1-1.6.1: new high-speed USB device number 7 using ehci-pci Nov 16 17:53:31 cp3039 kernel: [ 167.434495] usb 1-1.6.1: New USB device found, idVendor=0624, idProduct=0249 Nov 16 17:53:31 cp3039 kernel: [ 167.434497] usb 1-1.6.1: New USB device strings: Mfr=4, Product=5, SerialNumber=6 Nov 16 17:53:31 cp3039 kernel: [ 167.434499] usb 1-1.6.1: Product: Keyboard/Mouse Function Nov 16 17:53:31 cp3039 kernel: [ 167.434500] usb 1-1.6.1: Manufacturer: Avocent Nov 16 17:53:31 cp3039 kernel: [ 167.434501] usb 1-1.6.1: SerialNumber: 20121018 Nov 16 17:53:31 cp3039 kernel: [ 167.445462] hidraw: raw HID events driver (C) Jiri Kosina Nov 16 17:53:31 cp3039 kernel: [ 167.448055] usbcore: registered new interface driver usbhid Nov 16 17:53:31 cp3039 kernel: [ 167.448056] usbhid: USB HID core driver Nov 16 17:53:31 cp3039 kernel: [ 167.448895] input: Avocent Keyboard/Mouse Function as /devices/pci0000:00/0000:00:1a.0/usb1/1-1/1-1.6/1-1.6.1/1-1.6.1:1.0/0003:0624:0249.0001/input/input2 Nov 16 17:53:31 cp3039 kernel: [ 167.501159] hid-generic 0003:0624:0249.0001: input,hidraw0: USB HID v1.00 Keyboard [Avocent Keyboard/Mouse Function] on usb-0000:00:1a.0-1.6.1/input0 Nov 16 17:53:31 cp3039 kernel: [ 167.501264] input: Avocent Keyboard/Mouse Function as /devices/pci0000:00/0000:00:1a.0/usb1/1-1/1-1.6/1-1.6.1/1-1.6.1:1.1/0003:0624:0249.0002/input/input3 Nov 16 17:53:31 cp3039 kernel: [ 167.557078] hid-generic 0003:0624:0249.0002: input,hidraw1: USB HID v1.00 Mouse [Avocent Keyboard/Mouse Function] on usb-0000:00:1a.0-1.6.1/input1 Nov 16 17:53:31 cp3039 kernel: [ 167.557169] input: Avocent Keyboard/Mouse Function as /devices/pci0000:00/0000:00:1a.0/usb1/1-1/1-1.6/1-1.6.1/1-1.6.1:1.2/0003:0624:0249.0003/input/input4 Nov 16 17:53:31 cp3039 kernel: [ 167.557231] hid-generic 0003:0624:0249.0003: input,hidraw2: USB HID v1.00 Mouse [Avocent Keyboard/Mouse Function] on usb-0000:00:1a.0-1.6.1/input2 Nov 16 17:53:33 cp3039 kernel: [ 169.653117] usb 1-1.6.3: new high-speed USB device number 8 using ehci-pci Nov 16 17:53:33 cp3039 kernel: [ 169.770131] usb 1-1.6.3: New USB device found, idVendor=413c, idProduct=a102 Nov 16 17:53:33 cp3039 kernel: [ 169.770134] usb 1-1.6.3: New USB device strings: Mfr=1, Product=2, SerialNumber=0 Nov 16 17:53:33 cp3039 kernel: [ 169.770135] usb 1-1.6.3: Product: iDRAC Virtual NIC USB Device Nov 16 17:53:33 cp3039 kernel: [ 169.770136] usb 1-1.6.3: Manufacturer: Dell(TM) Nov 16 17:53:33 cp3039 kernel: [ 169.774819] cdc_ether 1-1.6.3:1.0 usb0: register 'cdc_ether' at usb-0000:00:1a.0-1.6.3, CDC Ethernet Device, d2:5c:2d:6b:98:cb Nov 16 17:53:33 cp3039 kernel: [ 169.774835] usbcore: registered new interface driver cdc_ether Nov 16 17:53:33 cp3039 kernel: [ 169.775426] cdc_ether 1-1.6.3:1.0 idrac: renamed from usb0 Nov 16 17:53:34 cp3039 kernel: [ 170.664311] usb 1-1.6.3: USB disconnect, device number 8 Nov 16 17:53:34 cp3039 kernel: [ 170.664353] cdc_ether 1-1.6.3:1.0 idrac: unregister 'cdc_ether' usb-0000:00:1a.0-1.6.3, CDC Ethernet Device Nov 16 17:54:26 cp3039 kernel: [ 222.916264] usb 1-1.6.1: USB disconnect, device number 7
In any case, this evening I started with connecting to the serial serial console (all normal, responds to enter keypress with a login prompt), then depooled and rebooted so I could observe a reboot myself on the console first.
The host rebooted very quickly and normally, with normal-looking dmesg output this time. However, I got no syslog output to the serial console during the shutdown, and apparently around the moment of reboot my serial console session got "hung". It stayed there displaying the login prompt from before, and unresponse to Ctrl-\ to escape out of the console. It's like the host rebooting took iDRAC with it. Had to kill the ssh session and jump back in (which also worked fine), but missed any actual bootup output due to this.
I think most likely whatever this all was, it was a one-shot thing, so closing for now as the host seems normal and repooled.