Page MenuHomePhabricator

Cannot connect to cp3039.mgmt.esams.wmnet:22
Closed, ResolvedPublic

Description

cp3039 didn't come back up online after reboot (kernel and varnish upgrades). The management interface at cp3039.mgmt.esams.wmnet responds to ping but I cannot connect via SSH.

Event Timeline

ema created this task.Nov 16 2016, 5:52 PM
Restricted Application added a project: Operations. · View Herald TranscriptNov 16 2016, 5:52 PM
Restricted Application added subscribers: Southparkfan, Aklapper. · View Herald Transcript
ema triaged this task as Medium priority.Nov 16 2016, 5:52 PM
ema added a comment.Nov 16 2016, 5:56 PM

The host just came back up online now, and I can ssh into the management interface. Leaving the ticket open though as it took a very long time for both things to happen.

18:40 < icinga-wm> PROBLEM - Host cp3039 is DOWN: PING CRITICAL - Packet loss = 100%
[...]
18:53 < icinga-wm> RECOVERY - Host cp3039 is UP: PING OK - Packet loss = 0%, RTA = 83.74 ms

@MoritzMuehlenhoff suggested on IRC that this issue could be due to some kind of hardware self-test, so perhaps it could be a good idea to take a deeper look into cp3039 hw-wise?

ema added a subscriber: BBlack.Nov 16 2016, 6:04 PM
BBlack added a comment.EditedNov 16 2016, 6:33 PM

I'll take a peek much later today.

Mentioned in SAL (#wikimedia-operations) [2016-11-17T00:38:19Z] <bblack> depooling cp3039 for hw/bios work - T150879

Mentioned in SAL (#wikimedia-operations) [2016-11-17T00:48:29Z] <bblack> repool cp3039 - T150879

The dmesg from the very-delayed bootup is interesting. Normally we get through the bulk of the bootup messages in under 20s, but the process took closer to 3-4 minutes in this case. There are a bunch of quite unique things happening in dmesg related to USB devices, and I think they're virtual USB devices related to iDRAC/firmware somehow: virtual consoles, virtual disks that don't work, etc. Some snippets that we don't normally see:

This was the bulk of the dmesg-level delay, dealing with a bunch of virtual disks that shouldn't be there and don't work right, starting shortly after the usual detection of /dev/sda and /dev/sdb (our 2x actual SSDs, the only disks that should exist):

Nov 16 17:53:28 cp3039 kernel: [   15.661146] usb 1-1.6: New USB device found, idVendor=413c, idProduct=a001
Nov 16 17:53:28 cp3039 kernel: [   15.661149] usb 1-1.6: New USB device strings: Mfr=1, Product=2, SerialNumber=3
Nov 16 17:53:28 cp3039 kernel: [   15.661151] usb 1-1.6: Product: Gadget USB HUB
Nov 16 17:53:28 cp3039 kernel: [   15.661152] usb 1-1.6: Manufacturer: no manufacturer
Nov 16 17:53:28 cp3039 kernel: [   15.661154] usb 1-1.6: SerialNumber: 0123456789
Nov 16 17:53:28 cp3039 kernel: [   15.661579] hub 1-1.6:1.0: USB hub found
Nov 16 17:53:28 cp3039 kernel: [   15.661766] hub 1-1.6:1.0: 6 ports detected
Nov 16 17:53:28 cp3039 kernel: [   15.920275] ata3: SATA link down (SStatus 0 SControl 300)
Nov 16 17:53:28 cp3039 kernel: [   16.108273] usb 1-1.6.2: new high-speed USB device number 4 using ehci-pci
Nov 16 17:53:28 cp3039 kernel: [   16.233874] usb 1-1.6.2: New USB device found, idVendor=0624, idProduct=0250
Nov 16 17:53:28 cp3039 kernel: [   16.241755] usb 1-1.6.2: New USB device strings: Mfr=4, Product=5, SerialNumber=6
Nov 16 17:53:28 cp3039 kernel: [   16.244289] ata4: SATA link down (SStatus 0 SControl 300)
Nov 16 17:53:28 cp3039 kernel: [   16.256145] usb 1-1.6.2: Product: Mass Storage Function
Nov 16 17:53:28 cp3039 kernel: [   16.261987] usb 1-1.6.2: Manufacturer: Avocent
Nov 16 17:53:28 cp3039 kernel: [   16.266946] usb 1-1.6.2: SerialNumber: 20120731
Nov 16 17:53:28 cp3039 kernel: [   16.298703] usb-storage 1-1.6.2:1.0: USB Mass Storage device detected
Nov 16 17:53:28 cp3039 kernel: [   16.306121] scsi host10: usb-storage 1-1.6.2:1.0
Nov 16 17:53:28 cp3039 kernel: [   16.311376] usbcore: registered new interface driver usb-storage
Nov 16 17:53:28 cp3039 kernel: [   16.318727] usbcore: registered new interface driver uas
Nov 16 17:53:28 cp3039 kernel: [   16.544252] usb 1-1.6.6: new high-speed USB device number 5 using ehci-pci
Nov 16 17:53:28 cp3039 kernel: [   16.564272] ata6: SATA link down (SStatus 0 SControl 300)
Nov 16 17:53:28 cp3039 kernel: [   16.665205] usb 1-1.6.6: New USB device found, idVendor=0624, idProduct=0253
Nov 16 17:53:28 cp3039 kernel: [   16.673078] usb 1-1.6.6: New USB device strings: Mfr=4, Product=5, SerialNumber=6
Nov 16 17:53:28 cp3039 kernel: [   16.681437] usb 1-1.6.6: Product: Mass Storage Function
Nov 16 17:53:28 cp3039 kernel: [   16.687271] usb 1-1.6.6: Manufacturer: Avocent
Nov 16 17:53:28 cp3039 kernel: [   16.692233] usb 1-1.6.6: SerialNumber: 20120731-3
Nov 16 17:53:28 cp3039 kernel: [   16.698995] usb-storage 1-1.6.6:1.0: USB Mass Storage device detected
Nov 16 17:53:28 cp3039 kernel: [   16.706315] scsi host11: usb-storage 1-1.6.6:1.0
Nov 16 17:53:28 cp3039 kernel: [   16.888313] ata7: SATA link down (SStatus 0 SControl 300)
Nov 16 17:53:28 cp3039 kernel: [   17.212296] ata8: SATA link down (SStatus 0 SControl 300)
Nov 16 17:53:28 cp3039 kernel: [   17.315436] scsi 10:0:0:0: Direct-Access     iDRAC    MAS001           0329 PQ: 0 ANSI: 0 CCS
Nov 16 17:53:28 cp3039 kernel: [   17.326072] scsi 10:0:0:1: Direct-Access     iDRAC    MAS002           0329 PQ: 0 ANSI: 0 CCS
Nov 16 17:53:28 cp3039 kernel: [   17.396054] scsi 10:0:0:2: Direct-Access     iDRAC    MAS021           0329 PQ: 0 ANSI: 0 CCS
Nov 16 17:53:28 cp3039 kernel: [   17.406553] scsi 10:0:0:3: Direct-Access     iDRAC    MAS022           0329 PQ: 0 ANSI: 0 CCS
Nov 16 17:53:28 cp3039 kernel: [   17.473185] scsi 10:0:0:4: Direct-Access     iDRAC    MAS024           0329 PQ: 0 ANSI: 0 CCS
Nov 16 17:53:28 cp3039 kernel: [   17.483742] sd 10:0:0:0: Attached scsi generic sg2 type 0
Nov 16 17:53:28 cp3039 kernel: [   17.485671] sd 10:0:0:0: [sdc] 102400 512-byte logical blocks: (52.4 MB/50.0 MiB)
Nov 16 17:53:28 cp3039 kernel: [   17.486665] sd 10:0:0:0: [sdc] Write Protect is on
Nov 16 17:53:28 cp3039 kernel: [   17.486669] sd 10:0:0:0: [sdc] Mode Sense: 23 00 80 00
Nov 16 17:53:28 cp3039 kernel: [   17.503892] sd 10:0:0:1: Attached scsi generic sg3 type 0
Nov 16 17:53:28 cp3039 kernel: [   17.510289] sd 10:0:0:2: Attached scsi generic sg4 type 0
Nov 16 17:53:28 cp3039 kernel: [   17.516607] sd 10:0:0:3: Attached scsi generic sg5 type 0
Nov 16 17:53:28 cp3039 kernel: [   17.522947] sd 10:0:0:4: Attached scsi generic sg6 type 0
Nov 16 17:53:28 cp3039 kernel: [   17.536368] ata9: SATA link down (SStatus 0 SControl 300)
Nov 16 17:53:28 cp3039 kernel: [   17.631311] sd 10:0:0:0: [sdc] No Caching mode page found
Nov 16 17:53:28 cp3039 kernel: [   17.636397] sd 10:0:0:1: [sdd] 30720 512-byte logical blocks: (15.7 MB/15.0 MiB)
Nov 16 17:53:28 cp3039 kernel: [   17.636784] sd 10:0:0:2: [sde] 614400 512-byte logical blocks: (315 MB/300 MiB)
Nov 16 17:53:28 cp3039 kernel: [   17.637782] sd 10:0:0:3: [sdf] 512000 512-byte logical blocks: (262 MB/250 MiB)
Nov 16 17:53:28 cp3039 kernel: [   17.638174] sd 10:0:0:4: [sdg] 20480 512-byte logical blocks: (10.5 MB/10.0 MiB)
Nov 16 17:53:28 cp3039 kernel: [   17.639157] sd 10:0:0:1: [sdd] Write Protect is on
Nov 16 17:53:28 cp3039 kernel: [   17.639160] sd 10:0:0:1: [sdd] Mode Sense: 23 00 80 00
Nov 16 17:53:28 cp3039 kernel: [   17.640396] sd 10:0:0:2: [sde] Write Protect is on
Nov 16 17:53:28 cp3039 kernel: [   17.640399] sd 10:0:0:2: [sde] Mode Sense: 23 00 80 00
Nov 16 17:53:28 cp3039 kernel: [   17.641405] sd 10:0:0:3: [sdf] Write Protect is off
Nov 16 17:53:28 cp3039 kernel: [   17.641408] sd 10:0:0:3: [sdf] Mode Sense: 23 00 00 00
Nov 16 17:53:28 cp3039 kernel: [   17.686347] sd 10:0:0:0: [sdc] Assuming drive cache: write through
Nov 16 17:53:28 cp3039 kernel: [   17.709909] scsi 11:0:0:0: Direct-Access     iDRAC    MAS025           0329 PQ: 0 ANSI: 0 CCS
Nov 16 17:53:28 cp3039 kernel: [   17.720355] sd 11:0:0:0: Attached scsi generic sg7 type 0
Nov 16 17:53:28 cp3039 kernel: [   17.722303] sd 11:0:0:0: [sdh] 122880 512-byte logical blocks: (62.9 MB/60.0 MiB)
Nov 16 17:53:28 cp3039 kernel: [   17.723296] sd 11:0:0:0: [sdh] Write Protect is off
Nov 16 17:53:28 cp3039 kernel: [   17.723298] sd 11:0:0:0: [sdh] Mode Sense: 23 00 00 00
Nov 16 17:53:28 cp3039 kernel: [   17.724798] sd 11:0:0:0: [sdh] No Caching mode page found
Nov 16 17:53:28 cp3039 kernel: [   17.724800] sd 11:0:0:0: [sdh] Assuming drive cache: write through
Nov 16 17:53:28 cp3039 kernel: [   17.780108] sd 10:0:0:4: [sdg] Write Protect is on
Nov 16 17:53:28 cp3039 kernel: [   17.785477] sd 10:0:0:4: [sdg] Mode Sense: 23 00 80 00
Nov 16 17:53:28 cp3039 kernel: [   17.860385] ata10: SATA link down (SStatus 0 SControl 300)
Nov 16 17:53:28 cp3039 kernel: [   17.935619] sd 10:0:0:1: [sdd] No Caching mode page found
Nov 16 17:53:28 cp3039 kernel: [   17.937195] sd 10:0:0:2: [sde] No Caching mode page found
Nov 16 17:53:28 cp3039 kernel: [   17.937197] sd 10:0:0:2: [sde] Assuming drive cache: write through
Nov 16 17:53:28 cp3039 kernel: [   17.954592] sd 10:0:0:1: [sdd] Assuming drive cache: write through
Nov 16 17:53:28 cp3039 kernel: [   18.094837] sd 10:0:0:3: [sdf] No Caching mode page found
Nov 16 17:53:28 cp3039 kernel: [   18.096203] sd 10:0:0:4: [sdg] No Caching mode page found
Nov 16 17:53:28 cp3039 kernel: [   18.096206] sd 10:0:0:4: [sdg] Assuming drive cache: write through
Nov 16 17:53:28 cp3039 kernel: [   18.113809] sd 10:0:0:3: [sdf] Assuming drive cache: write through
Nov 16 17:53:28 cp3039 kernel: [   27.947448] sd 11:0:0:0: [sdh] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Nov 16 17:53:28 cp3039 kernel: [   27.956780] sd 11:0:0:0: [sdh] tag#0 Sense Key : Medium Error [current] 
Nov 16 17:53:28 cp3039 kernel: [   27.964262] sd 11:0:0:0: [sdh] tag#0 Add. Sense: Unrecovered read error
Nov 16 17:53:28 cp3039 kernel: [   27.971639] sd 11:0:0:0: [sdh] tag#0 CDB: Read(10) 28 00 00 00 00 00 00 00 08 00
Nov 16 17:53:28 cp3039 kernel: [   27.979899] blk_update_request: critical medium error, dev sdh, sector 0
Nov 16 17:53:28 cp3039 kernel: [   27.987381] Buffer I/O error on dev sdh, logical block 0, async page read
Nov 16 17:53:28 cp3039 kernel: [   28.827452] sd 10:0:0:0: [sdc] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Nov 16 17:53:28 cp3039 kernel: [   28.836779] sd 10:0:0:0: [sdc] tag#0 Sense Key : Medium Error [current] 
Nov 16 17:53:28 cp3039 kernel: [   28.844263] sd 10:0:0:0: [sdc] tag#0 Add. Sense: Unrecovered read error
Nov 16 17:53:28 cp3039 kernel: [   28.851649] sd 10:0:0:0: [sdc] tag#0 CDB: Read(10) 28 00 00 00 00 00 00 00 08 00
Nov 16 17:53:28 cp3039 kernel: [   28.859907] blk_update_request: critical medium error, dev sdc, sector 0
Nov 16 17:53:28 cp3039 kernel: [   28.867386] Buffer I/O error on dev sdc, logical block 0, async page read
Nov 16 17:53:28 cp3039 kernel: [   38.028288] sd 11:0:0:0: [sdh] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Nov 16 17:53:28 cp3039 kernel: [   38.037614] sd 11:0:0:0: [sdh] tag#0 Sense Key : Medium Error [current] 
Nov 16 17:53:28 cp3039 kernel: [   38.045095] sd 11:0:0:0: [sdh] tag#0 Add. Sense: Unrecovered read error
Nov 16 17:53:28 cp3039 kernel: [   38.052477] sd 11:0:0:0: [sdh] tag#0 CDB: Read(10) 28 00 00 00 00 00 00 00 08 00
Nov 16 17:53:28 cp3039 kernel: [   38.060732] blk_update_request: critical medium error, dev sdh, sector 0
Nov 16 17:53:28 cp3039 kernel: [   38.068212] Buffer I/O error on dev sdh, logical block 0, async page read
Nov 16 17:53:28 cp3039 kernel: [   38.956443] sd 10:0:0:2: [sde] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Nov 16 17:53:28 cp3039 kernel: [   38.965770] sd 10:0:0:2: [sde] tag#0 Sense Key : Medium Error [current] 
Nov 16 17:53:28 cp3039 kernel: [   38.973241] sd 10:0:0:2: [sde] tag#0 Add. Sense: Unrecovered read error
Nov 16 17:53:28 cp3039 kernel: [   38.980624] sd 10:0:0:2: [sde] tag#0 CDB: Read(10) 28 00 00 00 00 00 00 00 08 00
Nov 16 17:53:28 cp3039 kernel: [   38.988878] blk_update_request: critical medium error, dev sde, sector 0
Nov 16 17:53:28 cp3039 kernel: [   38.996360] Buffer I/O error on dev sde, logical block 0, async page read
Nov 16 17:53:28 cp3039 kernel: [   48.110590] sd 11:0:0:0: [sdh] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Nov 16 17:53:28 cp3039 kernel: [   48.119906] sd 11:0:0:0: [sdh] tag#0 Sense Key : Medium Error [current] 
Nov 16 17:53:28 cp3039 kernel: [   48.127387] sd 11:0:0:0: [sdh] tag#0 Add. Sense: Unrecovered read error
Nov 16 17:53:28 cp3039 kernel: [   48.134769] sd 11:0:0:0: [sdh] tag#0 CDB: Read(10) 28 00 00 00 00 00 00 00 08 00
Nov 16 17:53:28 cp3039 kernel: [   48.143025] blk_update_request: critical medium error, dev sdh, sector 0
Nov 16 17:53:28 cp3039 kernel: [   48.150504] Buffer I/O error on dev sdh, logical block 0, async page read
Nov 16 17:53:28 cp3039 kernel: [   49.076143] sd 10:0:0:1: [sdd] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Nov 16 17:53:28 cp3039 kernel: [   49.085476] sd 10:0:0:1: [sdd] tag#0 Sense Key : Medium Error [current] 
Nov 16 17:53:28 cp3039 kernel: [   49.092978] sd 10:0:0:1: [sdd] tag#0 Add. Sense: Unrecovered read error
Nov 16 17:53:28 cp3039 kernel: [   49.100364] sd 10:0:0:1: [sdd] tag#0 CDB: Read(10) 28 00 00 00 00 00 00 00 08 00
Nov 16 17:53:28 cp3039 kernel: [   49.108624] blk_update_request: critical medium error, dev sdd, sector 0
Nov 16 17:53:28 cp3039 kernel: [   49.116104] Buffer I/O error on dev sdd, logical block 0, async page read
Nov 16 17:53:28 cp3039 kernel: [   57.829342] ldm_validate_partition_table(): Disk read failed.
Nov 16 17:53:28 cp3039 kernel: [   57.835807] Dev sdh: unable to read RDB block 0
Nov 16 17:53:28 cp3039 kernel: [   57.838640] sd 10:0:0:0: [sdc] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Nov 16 17:53:28 cp3039 kernel: [   57.838644] sd 10:0:0:0: [sdc] tag#0 Sense Key : Medium Error [current] 
Nov 16 17:53:28 cp3039 kernel: [   57.838648] sd 10:0:0:0: [sdc] tag#0 Add. Sense: Unrecovered read error
Nov 16 17:53:28 cp3039 kernel: [   57.838653] sd 10:0:0:0: [sdc] tag#0 CDB: Read(10) 28 00 00 00 00 00 00 00 08 00
Nov 16 17:53:28 cp3039 kernel: [   57.838655] blk_update_request: critical medium error, dev sdc, sector 0
Nov 16 17:53:28 cp3039 kernel: [   57.838658] Buffer I/O error on dev sdc, logical block 0, async page read
Nov 16 17:53:28 cp3039 kernel: [   57.868224] ldm_validate_partition_table(): Disk read failed.
Nov 16 17:53:28 cp3039 kernel: [   57.868232] Dev sdd: unable to read RDB block 0
Nov 16 17:53:28 cp3039 kernel: [   57.914371]  sdg: sdg1
Nov 16 17:53:28 cp3039 kernel: [   57.923623] ldm_validate_partition_table(): Disk read failed.
Nov 16 17:53:28 cp3039 kernel: [   57.925740] ldm_validate_partition_table(): Disk read failed.
Nov 16 17:53:28 cp3039 kernel: [   57.925747] Dev sdc: unable to read RDB block 0
Nov 16 17:53:28 cp3039 kernel: [   57.941525] Dev sde: unable to read RDB block 0
Nov 16 17:53:28 cp3039 kernel: [   58.062436] random: nonblocking pool is initialized
Nov 16 17:53:28 cp3039 kernel: [   58.063084]  sdd: unable to read partition table
Nov 16 17:53:28 cp3039 kernel: [   58.073044]  sdh: unable to read partition table
Nov 16 17:53:28 cp3039 kernel: [   58.265838]  sdc: unable to read partition table
Nov 16 17:53:28 cp3039 kernel: [   58.290827]  sde: unable to read partition table
Nov 16 17:53:28 cp3039 kernel: [   58.299697] sd 11:0:0:0: [sdh] Attached SCSI removable disk
Nov 16 17:53:28 cp3039 kernel: [   58.401997]  sdf: sdf1
Nov 16 17:53:28 cp3039 kernel: [   58.543337] sd 10:0:0:0: [sdc] Attached SCSI removable disk
Nov 16 17:53:28 cp3039 kernel: [   58.813394] sd 10:0:0:2: [sde] Attached SCSI removable disk
Nov 16 17:53:28 cp3039 kernel: [   59.189754] sd 10:0:0:1: [sdd] Attached SCSI removable disk
Nov 16 17:53:28 cp3039 kernel: [   59.191300] sd 10:0:0:4: [sdg] Attached SCSI removable disk
Nov 16 17:53:28 cp3039 kernel: [   59.247010] sd 10:0:0:3: [sdf] Attached SCSI removable disk
Nov 16 17:53:28 cp3039 kernel: [   78.122949] sd 10:0:0:1: [sdd] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Nov 16 17:53:28 cp3039 kernel: [   78.132285] sd 10:0:0:1: [sdd] tag#0 Sense Key : Medium Error [current] 
Nov 16 17:53:28 cp3039 kernel: [   78.139774] sd 10:0:0:1: [sdd] tag#0 Add. Sense: Unrecovered read error
Nov 16 17:53:28 cp3039 kernel: [   78.147168] sd 10:0:0:1: [sdd] tag#0 CDB: Read(10) 28 00 00 00 01 10 00 00 f0 00
Nov 16 17:53:28 cp3039 kernel: [   78.155421] blk_update_request: critical medium error, dev sdd, sector 272
Nov 16 17:53:28 cp3039 kernel: [   88.264944] sd 10:0:0:4: [sdg] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Nov 16 17:53:28 cp3039 kernel: [   88.274277] sd 10:0:0:4: [sdg] tag#0 Sense Key : Medium Error [current] 
Nov 16 17:53:28 cp3039 kernel: [   88.281763] sd 10:0:0:4: [sdg] tag#0 Add. Sense: Unrecovered read error
Nov 16 17:53:28 cp3039 kernel: [   88.289155] sd 10:0:0:4: [sdg] tag#0 CDB: Read(10) 28 00 00 00 00 80 00 00 90 00
Nov 16 17:53:28 cp3039 kernel: [   88.297417] blk_update_request: critical medium error, dev sdg, sector 128
Nov 16 17:53:28 cp3039 kernel: [   98.443031] sd 10:0:0:2: [sde] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Nov 16 17:53:28 cp3039 kernel: [   98.452363] sd 10:0:0:2: [sde] tag#0 Sense Key : Medium Error [current] 
Nov 16 17:53:28 cp3039 kernel: [   98.459847] sd 10:0:0:2: [sde] tag#0 Add. Sense: Unrecovered read error
Nov 16 17:53:28 cp3039 kernel: [   98.467239] sd 10:0:0:2: [sde] tag#0 CDB: Read(10) 28 00 00 00 00 40 00 00 38 00
Nov 16 17:53:28 cp3039 kernel: [   98.475502] blk_update_request: critical medium error, dev sde, sector 64
Nov 16 17:53:28 cp3039 kernel: [  108.574378] sd 10:0:0:3: [sdf] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Nov 16 17:53:28 cp3039 kernel: [  108.583712] sd 10:0:0:3: [sdf] tag#0 Sense Key : Medium Error [current] 
Nov 16 17:53:28 cp3039 kernel: [  108.591189] sd 10:0:0:3: [sdf] tag#0 Add. Sense: Unrecovered read error
Nov 16 17:53:28 cp3039 kernel: [  108.598581] sd 10:0:0:3: [sdf] tag#0 CDB: Read(10) 28 00 00 00 00 20 00 00 18 00
Nov 16 17:53:28 cp3039 kernel: [  108.606842] blk_update_request: critical medium error, dev sdf, sector 32
Nov 16 17:53:28 cp3039 kernel: [  118.675006] sd 10:0:0:0: [sdc] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Nov 16 17:53:28 cp3039 kernel: [  118.684342] sd 10:0:0:0: [sdc] tag#0 Sense Key : Medium Error [current] 
Nov 16 17:53:28 cp3039 kernel: [  118.691835] sd 10:0:0:0: [sdc] tag#0 Add. Sense: Unrecovered read error
Nov 16 17:53:28 cp3039 kernel: [  118.699230] sd 10:0:0:0: [sdc] tag#0 CDB: Read(10) 28 00 00 00 01 10 00 00 f0 00
Nov 16 17:53:28 cp3039 kernel: [  118.707491] blk_update_request: critical medium error, dev sdc, sector 272
Nov 16 17:53:28 cp3039 kernel: [  128.755801] sd 10:0:0:1: [sdd] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Nov 16 17:53:28 cp3039 kernel: [  128.765136] sd 10:0:0:1: [sdd] tag#0 Sense Key : Medium Error [current] 
Nov 16 17:53:28 cp3039 kernel: [  128.772628] sd 10:0:0:1: [sdd] tag#0 Add. Sense: Unrecovered read error
Nov 16 17:53:28 cp3039 kernel: [  128.780009] sd 10:0:0:1: [sdd] tag#0 CDB: Read(10) 28 00 00 00 01 10 00 00 08 00
Nov 16 17:53:28 cp3039 kernel: [  128.788269] blk_update_request: critical medium error, dev sdd, sector 272
Nov 16 17:53:28 cp3039 kernel: [  128.795953] Buffer I/O error on dev sdd, logical block 34, async page read
Nov 16 17:53:28 cp3039 kernel: [  138.876647] sd 10:0:0:4: [sdg] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Nov 16 17:53:28 cp3039 kernel: [  138.885985] sd 10:0:0:4: [sdg] tag#0 Sense Key : Medium Error [current] 
Nov 16 17:53:28 cp3039 kernel: [  138.893472] sd 10:0:0:4: [sdg] tag#0 Add. Sense: Unrecovered read error
Nov 16 17:53:28 cp3039 kernel: [  138.900868] sd 10:0:0:4: [sdg] tag#0 CDB: Read(10) 28 00 00 00 00 80 00 00 08 00
Nov 16 17:53:28 cp3039 kernel: [  138.909132] blk_update_request: critical medium error, dev sdg, sector 128
Nov 16 17:53:28 cp3039 kernel: [  138.916810] Buffer I/O error on dev sdg, logical block 16, async page read
Nov 16 17:53:28 cp3039 kernel: [  145.342643] usb 1-1.6-port6: cannot reset (err = -71)
Nov 16 17:53:28 cp3039 kernel: [  145.352390] usb 1-1.6-port6: cannot reset (err = -71)
Nov 16 17:53:28 cp3039 kernel: [  145.362140] usb 1-1.6-port6: cannot reset (err = -71)
Nov 16 17:53:28 cp3039 kernel: [  145.371889] usb 1-1.6-port6: cannot reset (err = -71)
Nov 16 17:53:28 cp3039 kernel: [  145.381641] usb 1-1.6-port6: cannot reset (err = -71)
Nov 16 17:53:28 cp3039 kernel: [  145.387297] usb 1-1.6-port6: Cannot enable. Maybe the USB cable is bad?
Nov 16 17:53:28 cp3039 kernel: [  145.394905] usb 1-1.6: USB disconnect, device number 3
Nov 16 17:53:28 cp3039 kernel: [  145.400658] usb 1-1.6.2: USB disconnect, device number 4
Nov 16 17:53:28 cp3039 kernel: [  145.410029] usb 1-1.6-port6: cannot disable (err = -71)
Nov 16 17:53:28 cp3039 kernel: [  145.416038] sd 10:0:0:1: [sdd] tag#0 FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
Nov 16 17:53:28 cp3039 kernel: [  145.425856] sd 10:0:0:1: [sdd] tag#0 CDB: Read(10) 28 00 00 00 02 00 00 00 08 00
Nov 16 17:53:28 cp3039 kernel: [  145.434110] blk_update_request: I/O error, dev sdd, sector 512
Nov 16 17:53:28 cp3039 kernel: [  145.440645] sd 10:0:0:4: [sdg] tag#0 FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
Nov 16 17:53:28 cp3039 kernel: [  145.450458] sd 10:0:0:4: [sdg] tag#0 CDB: Read(10) 28 00 00 00 00 80 00 00 08 00
Nov 16 17:53:28 cp3039 kernel: [  145.458720] blk_update_request: I/O error, dev sdg, sector 128
Nov 16 17:53:28 cp3039 kernel: [  145.465239] Buffer I/O error on dev sdg, logical block 16, async page read
Nov 16 17:53:28 cp3039 kernel: [  145.472940] sd 10:0:0:2: [sde] tag#0 FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
Nov 16 17:53:28 cp3039 kernel: [  145.482756] sd 10:0:0:2: [sde] tag#0 CDB: Read(10) 28 00 00 00 00 80 00 00 90 00
Nov 16 17:53:28 cp3039 kernel: [  145.491021] blk_update_request: I/O error, dev sde, sector 128
Nov 16 17:53:28 cp3039 kernel: [  145.497554] sd 10:0:0:1: [sdd] tag#0 FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
Nov 16 17:53:28 cp3039 kernel: [  145.499727] scsi 10:0:0:0: rejecting I/O to offline device
Nov 16 17:53:28 cp3039 kernel: [  145.499730] scsi 10:0:0:0: [sdc] killing request
Nov 16 17:53:28 cp3039 kernel: [  145.499741] scsi 10:0:0:0: [sdc] FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
Nov 16 17:53:28 cp3039 kernel: [  145.499745] scsi 10:0:0:0: [sdc] CDB: Read(10) 28 00 00 00 01 10 00 00 08 00
Nov 16 17:53:28 cp3039 kernel: [  145.499747] blk_update_request: I/O error, dev sdc, sector 272
Nov 16 17:53:28 cp3039 kernel: [  145.499750] Buffer I/O error on dev sdc, logical block 34, async page read
Nov 16 17:53:28 cp3039 kernel: [  145.499805] Buffer I/O error on dev sdc, logical block 2, async page read
Nov 16 17:53:28 cp3039 kernel: [  145.557678] sd 10:0:0:1: [sdd] tag#0 CDB: Read(10) 28 00 00 00 02 00 00 00 08 00
Nov 16 17:53:28 cp3039 kernel: [  145.565932] blk_update_request: I/O error, dev sdd, sector 512
Nov 16 17:53:28 cp3039 kernel: [  145.572442] Buffer I/O error on dev sdd, logical block 64, async page read
Nov 16 17:53:28 cp3039 kernel: [  145.580143] sd 10:0:0:2: [sde] tag#0 FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
Nov 16 17:53:28 cp3039 kernel: [  145.589945] sd 10:0:0:2: [sde] tag#0 CDB: Read(10) 28 00 00 00 00 40 00 00 08 00
Nov 16 17:53:28 cp3039 kernel: [  145.598202] blk_update_request: I/O error, dev sde, sector 64
Nov 16 17:53:28 cp3039 kernel: [  145.604619] Buffer I/O error on dev sde, logical block 8, async page read
Nov 16 17:53:28 cp3039 kernel: [  145.612212] sd 10:0:0:3: [sdf] tag#0 FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
Nov 16 17:53:28 cp3039 kernel: [  145.622025] sd 10:0:0:3: [sdf] tag#0 CDB: Read(10) 28 00 00 00 00 40 00 00 38 00
Nov 16 17:53:28 cp3039 kernel: [  145.630282] blk_update_request: I/O error, dev sdf, sector 64
Nov 16 17:53:28 cp3039 kernel: [  145.636719] sd 10:0:0:2: [sde] tag#0 FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
Nov 16 17:53:28 cp3039 kernel: [  145.646527] sd 10:0:0:2: [sde] tag#0 CDB: Read(10) 28 00 00 00 00 80 00 00 08 00
Nov 16 17:53:28 cp3039 kernel: [  145.654785] blk_update_request: I/O error, dev sde, sector 128
Nov 16 17:53:28 cp3039 kernel: [  145.661296] Buffer I/O error on dev sde, logical block 16, async page read
Nov 16 17:53:28 cp3039 kernel: [  145.668979] sd 10:0:0:3: [sdf] tag#0 FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
Nov 16 17:53:28 cp3039 kernel: [  145.678780] sd 10:0:0:3: [sdf] tag#0 CDB: Read(10) 28 00 00 00 00 80 00 00 90 00
Nov 16 17:53:28 cp3039 kernel: [  145.687028] blk_update_request: I/O error, dev sdf, sector 128
Nov 16 17:53:28 cp3039 kernel: [  145.693560] sd 10:0:0:3: [sdf] tag#0 FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
Nov 16 17:53:28 cp3039 kernel: [  145.703372] sd 10:0:0:3: [sdf] tag#0 CDB: Read(10) 28 00 00 00 00 20 00 00 08 00
Nov 16 17:53:28 cp3039 kernel: [  145.711628] blk_update_request: I/O error, dev sdf, sector 32
Nov 16 17:53:28 cp3039 kernel: [  145.718046] Buffer I/O error on dev sdf, logical block 4, async page read
Nov 16 17:53:28 cp3039 kernel: [  145.725716] Buffer I/O error on dev sdf, logical block 16, async page read
Nov 16 17:53:28 cp3039 kernel: [  145.896273] usb 1-1.6.6: USB disconnect, device number 5
Nov 16 17:53:28 cp3039 kernel: [  150.768995] md: md0 stopped.
Nov 16 17:53:28 cp3039 kernel: [  150.773149] md: bind<sdb1>
Nov 16 17:53:28 cp3039 kernel: [  150.776414] md: bind<sda1>
Nov 16 17:53:28 cp3039 kernel: [  150.781227] md: raid1 personality registered for level 1

Nearer to the end (after more normal stuff) we have some virtual USB keyboard/mouse stuff, and then some virtual ethernet stuff:

Nov 16 17:53:31 cp3039 kernel: [  167.316994] usb 1-1.6.1: new high-speed USB device number 7 using ehci-pci
Nov 16 17:53:31 cp3039 kernel: [  167.434495] usb 1-1.6.1: New USB device found, idVendor=0624, idProduct=0249
Nov 16 17:53:31 cp3039 kernel: [  167.434497] usb 1-1.6.1: New USB device strings: Mfr=4, Product=5, SerialNumber=6
Nov 16 17:53:31 cp3039 kernel: [  167.434499] usb 1-1.6.1: Product: Keyboard/Mouse Function
Nov 16 17:53:31 cp3039 kernel: [  167.434500] usb 1-1.6.1: Manufacturer: Avocent
Nov 16 17:53:31 cp3039 kernel: [  167.434501] usb 1-1.6.1: SerialNumber: 20121018
Nov 16 17:53:31 cp3039 kernel: [  167.445462] hidraw: raw HID events driver (C) Jiri Kosina
Nov 16 17:53:31 cp3039 kernel: [  167.448055] usbcore: registered new interface driver usbhid
Nov 16 17:53:31 cp3039 kernel: [  167.448056] usbhid: USB HID core driver
Nov 16 17:53:31 cp3039 kernel: [  167.448895] input: Avocent Keyboard/Mouse Function as /devices/pci0000:00/0000:00:1a.0/usb1/1-1/1-1.6/1-1.6.1/1-1.6.1:1.0/0003:0624:0249.0001/input/input2
Nov 16 17:53:31 cp3039 kernel: [  167.501159] hid-generic 0003:0624:0249.0001: input,hidraw0: USB HID v1.00 Keyboard [Avocent Keyboard/Mouse Function] on usb-0000:00:1a.0-1.6.1/input0
Nov 16 17:53:31 cp3039 kernel: [  167.501264] input: Avocent Keyboard/Mouse Function as /devices/pci0000:00/0000:00:1a.0/usb1/1-1/1-1.6/1-1.6.1/1-1.6.1:1.1/0003:0624:0249.0002/input/input3
Nov 16 17:53:31 cp3039 kernel: [  167.557078] hid-generic 0003:0624:0249.0002: input,hidraw1: USB HID v1.00 Mouse [Avocent Keyboard/Mouse Function] on usb-0000:00:1a.0-1.6.1/input1
Nov 16 17:53:31 cp3039 kernel: [  167.557169] input: Avocent Keyboard/Mouse Function as /devices/pci0000:00/0000:00:1a.0/usb1/1-1/1-1.6/1-1.6.1/1-1.6.1:1.2/0003:0624:0249.0003/input/input4
Nov 16 17:53:31 cp3039 kernel: [  167.557231] hid-generic 0003:0624:0249.0003: input,hidraw2: USB HID v1.00 Mouse [Avocent Keyboard/Mouse Function] on usb-0000:00:1a.0-1.6.1/input2
Nov 16 17:53:33 cp3039 kernel: [  169.653117] usb 1-1.6.3: new high-speed USB device number 8 using ehci-pci
Nov 16 17:53:33 cp3039 kernel: [  169.770131] usb 1-1.6.3: New USB device found, idVendor=413c, idProduct=a102
Nov 16 17:53:33 cp3039 kernel: [  169.770134] usb 1-1.6.3: New USB device strings: Mfr=1, Product=2, SerialNumber=0
Nov 16 17:53:33 cp3039 kernel: [  169.770135] usb 1-1.6.3: Product: iDRAC Virtual NIC USB Device
Nov 16 17:53:33 cp3039 kernel: [  169.770136] usb 1-1.6.3: Manufacturer: Dell(TM)
Nov 16 17:53:33 cp3039 kernel: [  169.774819] cdc_ether 1-1.6.3:1.0 usb0: register 'cdc_ether' at usb-0000:00:1a.0-1.6.3, CDC Ethernet Device, d2:5c:2d:6b:98:cb
Nov 16 17:53:33 cp3039 kernel: [  169.774835] usbcore: registered new interface driver cdc_ether
Nov 16 17:53:33 cp3039 kernel: [  169.775426] cdc_ether 1-1.6.3:1.0 idrac: renamed from usb0
Nov 16 17:53:34 cp3039 kernel: [  170.664311] usb 1-1.6.3: USB disconnect, device number 8
Nov 16 17:53:34 cp3039 kernel: [  170.664353] cdc_ether 1-1.6.3:1.0 idrac: unregister 'cdc_ether' usb-0000:00:1a.0-1.6.3, CDC Ethernet Device
Nov 16 17:54:26 cp3039 kernel: [  222.916264] usb 1-1.6.1: USB disconnect, device number 7
BBlack closed this task as Resolved.Nov 17 2016, 1:03 AM
BBlack claimed this task.

In any case, this evening I started with connecting to the serial serial console (all normal, responds to enter keypress with a login prompt), then depooled and rebooted so I could observe a reboot myself on the console first.

The host rebooted very quickly and normally, with normal-looking dmesg output this time. However, I got no syslog output to the serial console during the shutdown, and apparently around the moment of reboot my serial console session got "hung". It stayed there displaying the login prompt from before, and unresponse to Ctrl-\ to escape out of the console. It's like the host rebooting took iDRAC with it. Had to kill the ssh session and jump back in (which also worked fine), but missed any actual bootup output due to this.

I think most likely whatever this all was, it was a one-shot thing, so closing for now as the host seems normal and repooled.