diagnose failed(?) sda on ms-be1022
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	fgiunchedi
	Jul 18 2016, 8:34 AM

Description

I'm seeing some errors from sda (the ssd) on ms-be1022 reported at the os level:

[   17.332623] sd 0:1:0:0: [sda] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[   17.332636] sd 0:1:0:0: [sda] tag#0 Sense Key : Aborted Command [current] 
[   17.332643] sd 0:1:0:0: [sda] tag#0 Add. Sense: Information unit iuCRC error detected
[   17.332650] sd 0:1:0:0: [sda] tag#0 CDB: Read(10) 28 00 04 35 45 28 00 00 80 00
[   17.332655] blk_update_request: I/O error, dev sda, sector 70599976
[   17.360706] md/raid1:md0: sda1: rescheduling sector 70532392
[   17.387429] md/raid1:md0: redirecting sector 70532392 to other mirror: sda1
[   17.473379] sd 0:1:0:0: [sda] tag#1 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[   17.473383] sd 0:1:0:0: [sda] tag#1 Sense Key : Aborted Command [current] 
[   17.473387] sd 0:1:0:0: [sda] tag#1 Add. Sense: Information unit iuCRC error detected
[   17.473391] sd 0:1:0:0: [sda] tag#1 CDB: Read(10) 28 00 00 bd ad 00 00 01 00 00
[   17.473392] blk_update_request: I/O error, dev sda, sector 12430592
[   17.503116] md/raid1:md0: sda1: rescheduling sector 12363008
[   17.532025] md/raid1:md0: redirecting sector 12363008 to other mirror: sda1
[   17.573074] SGI XFS with ACLs, security attributes, realtime, no debug enabled
[   17.583548] XFS (sda3): Mounting V4 Filesystem
[   17.583587] XFS (sdb3): Mounting V4 Filesystem
[   17.592953] XFS (sdi1): Mounting V4 Filesystem
[   17.595486] XFS (sdl1): Mounting V4 Filesystem
[   17.596036] XFS (sdj1): Mounting V4 Filesystem
[   17.596196] XFS (sdh1): Mounting V4 Filesystem
[   17.596972] XFS (sde1): Mounting V4 Filesystem
[   17.597053] XFS (sdg1): Mounting V4 Filesystem
[   17.600113] XFS (sdm1): Mounting V4 Filesystem
[   17.601601] XFS (sdk1): Mounting V4 Filesystem
[   17.603472] XFS (sdf1): Mounting V4 Filesystem
[   17.603920] XFS (sdc1): Mounting V4 Filesystem
[   17.604066] XFS (sdn1): Mounting V4 Filesystem
[   17.604373] XFS (sdd1): Mounting V4 Filesystem
[   17.619809] systemd-journald[541]: Received request to flush runtime journal from PID 1
[   17.669328] XFS (sda3): Ending clean mount
[   17.669442] XFS (sdb3): Ending clean mount
[   17.892799] XFS (sdn1): Ending clean mount
[   17.897595] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
[   17.915337] XFS (sdk1): Ending clean mount
[   17.939498] XFS (sdg1): Ending clean mount
[   17.963082] XFS (sdc1): Ending clean mount
[   17.992090] XFS (sdi1): Ending clean mount
[   17.994885] XFS (sdm1): Ending clean mount
[   18.026080] XFS (sdf1): Ending clean mount
[   18.028144] XFS (sdj1): Ending clean mount
[   18.032636] XFS (sdd1): Ending clean mount
[   18.045586] XFS (sdh1): Ending clean mount
[   18.071968] XFS (sdl1): Ending clean mount
[   18.071975] sd 0:1:0:0: [sda] tag#1 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[   18.071980] sd 0:1:0:0: [sda] tag#1 Sense Key : Aborted Command [current] 
[   18.071984] sd 0:1:0:0: [sda] tag#1 Add. Sense: Information unit iuCRC error detected
[   18.071988] sd 0:1:0:0: [sda] tag#1 CDB: Read(10) 28 00 00 25 94 c0 00 01 00 00
[   18.071990] blk_update_request: I/O error, dev sda, sector 2462912
[   18.074208] md/raid1:md0: sda1: rescheduling sector 2395328
[   18.082692] md/raid1:md0: redirecting sector 2395328 to other mirror: sda1
[   18.092752] XFS (sde1): Ending clean mount
[   18.280270] RPC: Registered named UNIX socket transport module.
[   18.280274] RPC: Registered udp transport module.
[   18.280276] RPC: Registered tcp transport module.
[   18.280277] RPC: Registered tcp NFSv4.1 backchannel transport module.
[   18.286982] FS-Cache: Loaded
[   18.297310] FS-Cache: Netfs 'nfs' registered for caching
[   18.315212] sd 0:1:0:0: [sda] tag#2 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[   18.315224] sd 0:1:0:0: [sda] tag#2 Sense Key : Aborted Command [current] 
[   18.315231] sd 0:1:0:0: [sda] tag#2 Add. Sense: Information unit iuCRC error detected
[   18.315236] sd 0:1:0:0: [sda] tag#2 CDB: Read(10) 28 00 02 ca 53 28 00 00 20 00
[   18.315241] blk_update_request: I/O error, dev sda, sector 46813992
[   18.317223] md/raid1:md0: sda1: rescheduling sector 46746408
[   18.322373] md/raid1:md0: redirecting sector 46746408 to other mirror: sda1
[   18.348705] sd 0:1:0:0: [sda] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[   18.348714] sd 0:1:0:0: [sda] tag#0 Sense Key : Aborted Command [current] 
[   18.348721] sd 0:1:0:0: [sda] tag#0 Add. Sense: Information unit iuCRC error detected
[   18.348726] sd 0:1:0:0: [sda] tag#0 CDB: Read(10) 28 00 02 ca 28 c8 00 00 18 00
[   18.348731] blk_update_request: I/O error, dev sda, sector 46803144
[   18.350701] md/raid1:md0: sda1: rescheduling sector 46735560
[   18.352886] md/raid1:md0: redirecting sector 46735560 to other mirror: sda1
[   18.352909] Installing knfsd (copyright (C) 1996 okir@monad.swb.de).
[   21.645503] tg3 0000:02:00.0 eth0: Link is up at 1000 Mbps, full duplex
[   21.645513] tg3 0000:02:00.0 eth0: Flow control is on for TX and on for RX
[   21.645517] tg3 0000:02:00.0 eth0: EEE is disabled
[   21.645542] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[   23.214650] ip_tables: (C) 2000-2006 Netfilter Core Team
[   23.222484] nf_conntrack version 0.5.0 (32768 buckets, 262144 max)
[   23.263673] sd 0:1:0:0: [sda] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[   23.263680] sd 0:1:0:0: [sda] tag#0 Sense Key : Aborted Command [current] 
[   23.263685] sd 0:1:0:0: [sda] tag#0 Add. Sense: Information unit iuCRC error detected
[   23.263690] sd 0:1:0:0: [sda] tag#0 CDB: Read(10) 28 00 04 35 20 a0 00 00 10 00
[   23.263693] blk_update_request: I/O error, dev sda, sector 70590624
[   23.265781] md/raid1:md0: sda1: rescheduling sector 70523040
[   23.267951] md/raid1:md0: redirecting sector 70523040 to other mirror: sda1
[   23.279307] ip6_tables: (C) 2000-2006 Netfilter Core Team
[   23.474018] 8021q: 802.1Q VLAN Support v1.8
[   23.537362] sd 0:1:0:0: [sda] tag#1 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[   23.537366] sd 0:1:0:0: [sda] tag#1 Sense Key : Aborted Command [current] 
[   23.537370] sd 0:1:0:0: [sda] tag#1 Add. Sense: Information unit iuCRC error detected
[   23.537374] sd 0:1:0:0: [sda] tag#1 CDB: Read(10) 28 00 04 bd 08 58 00 00 a8 00
[   23.537376] blk_update_request: I/O error, dev sda, sector 79497304
[   23.567435] md/raid1:md0: sda1: rescheduling sector 79429720
[   23.594229] md/raid1:md0: sda1: rescheduling sector 79429728
[   23.620548] md/raid1:md0: sda1: rescheduling sector 79429736
[   23.622919] md/raid1:md0: sda1: rescheduling sector 79429744
[   23.622919] md/raid1:md0: sda1: rescheduling sector 79429752
[   23.622920] md/raid1:md0: sda1: rescheduling sector 79429760
[   23.622921] md/raid1:md0: sda1: rescheduling sector 79429768
[   23.622921] md/raid1:md0: sda1: rescheduling sector 79429776
[   23.622922] md/raid1:md0: sda1: rescheduling sector 79429784
[   23.623124] md/raid1:md0: redirecting sector 79429720 to other mirror: sdb1
[   23.623421] md/raid1:md0: redirecting sector 79429728 to other mirror: sdb1
[   23.623668] md/raid1:md0: redirecting sector 79429736 to other mirror: sdb1
[   23.623775] md/raid1:md0: redirecting sector 79429744 to other mirror: sdb1
[   23.623873] md/raid1:md0: redirecting sector 79429752 to other mirror: sdb1
[   23.623970] md/raid1:md0: redirecting sector 79429760 to other mirror: sdb1
[   23.624069] md/raid1:md0: redirecting sector 79429768 to other mirror: sdb1
[   23.624215] md/raid1:md0: redirecting sector 79429776 to other mirror: sdb1
[   23.624705] md/raid1:md0: redirecting sector 79429784 to other mirror: sdb1
[   23.661583] sd 0:1:0:0: [sda] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[   23.661589] sd 0:1:0:0: [sda] tag#0 Sense Key : Aborted Command [current] 
[   23.661594] sd 0:1:0:0: [sda] tag#0 Add. Sense: Information unit iuCRC error detected
[   23.661598] sd 0:1:0:0: [sda] tag#0 CDB: Read(10) 28 00 01 65 cc 30 00 00 08 00
[   23.661601] blk_update_request: I/O error, dev sda, sector 23448624
[   23.713483] sd 0:1:0:0: [sda] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[   23.713487] sd 0:1:0:0: [sda] tag#0 Sense Key : Aborted Command [current] 
[   23.713490] sd 0:1:0:0: [sda] tag#0 Add. Sense: Information unit iuCRC error detected
[   23.713493] sd 0:1:0:0: [sda] tag#0 CDB: Read(10) 28 00 01 65 cc 30 00 00 08 00
[   23.713495] blk_update_request: I/O error, dev sda, sector 23448624
[   23.761525] Process accounting resumed
[   23.784280] sd 0:1:0:0: [sda] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[   23.784284] sd 0:1:0:0: [sda] tag#0 Sense Key : Aborted Command [current] 
[   23.784288] sd 0:1:0:0: [sda] tag#0 Add. Sense: Information unit iuCRC error detected
[   23.784291] sd 0:1:0:0: [sda] tag#0 CDB: Read(10) 28 00 00 32 13 80 00 00 70 00
[   23.784294] blk_update_request: I/O error, dev sda, sector 3281792
[   23.808450] sd 0:1:0:0: [sda] tag#2 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[   23.808453] sd 0:1:0:0: [sda] tag#2 Sense Key : Aborted Command [current] 
[   23.808458] sd 0:1:0:0: [sda] tag#2 Add. Sense: Information unit iuCRC error detected
[   23.808461] sd 0:1:0:0: [sda] tag#2 CDB: Read(10) 28 00 00 05 86 40 00 01 00 00
[   23.808464] blk_update_request: I/O error, dev sda, sector 362048
[   23.872805] Process accounting resumed
[   24.222559] sd 0:1:0:0: [sda] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[   24.222576] sd 0:1:0:0: [sda] tag#0 Sense Key : Aborted Command [current] 
[   24.222585] sd 0:1:0:0: [sda] tag#0 Add. Sense: Information unit iuCRC error detected
[   24.222591] sd 0:1:0:0: [sda] tag#0 CDB: Read(10) 28 00 04 b2 ae c8 00 00 40 00
[   24.222595] blk_update_request: I/O error, dev sda, sector 78819016
[   24.276300] sd 0:1:0:0: [sda] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[   24.276312] sd 0:1:0:0: [sda] tag#0 Sense Key : Aborted Command [current] 
[   24.276320] sd 0:1:0:0: [sda] tag#0 Add. Sense: Information unit iuCRC error detected
[   24.276326] sd 0:1:0:0: [sda] tag#0 CDB: Read(10) 28 00 04 b2 ae c8 00 00 40 00
[   24.276330] blk_update_request: I/O error, dev sda, sector 78819016
[   24.350244] sd 0:1:0:0: [sda] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[   24.350249] sd 0:1:0:0: [sda] tag#0 Sense Key : Aborted Command [current] 
[   24.350252] sd 0:1:0:0: [sda] tag#0 Add. Sense: Information unit iuCRC error detected
[   24.350255] sd 0:1:0:0: [sda] tag#0 CDB: Read(10) 28 00 01 95 38 e0 00 00 08 00
[   24.350257] blk_update_request: I/O error, dev sda, sector 26556640
[   24.953969] sd 0:1:0:0: [sda] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[   24.954135] sd 0:1:0:0: [sda] tag#0 Sense Key : Aborted Command [current] 
[   24.954139] sd 0:1:0:0: [sda] tag#0 Add. Sense: Information unit iuCRC error detected
[   24.954307] sd 0:1:0:0: [sda] tag#0 CDB: Read(10) 28 00 01 9a 7a 68 00 00 80 00
[   24.954475] blk_update_request: I/O error, dev sda, sector 26901096
[   32.785897] hpsa 0000:08:00.0: Acknowledging event: 0xc0000000 (HP SSD Smart Path configuration change)
[   32.834473] hpsa 0000:08:00.0: scsi 0:1:0:0: updated Direct-Access     HP       LOGICAL VOLUME   RAID-0 SSDSmartPathCap+ En+ Exp=1
[   32.834482] hpsa 0000:08:00.0: scsi 0:1:0:1: updated Direct-Access     HP       LOGICAL VOLUME   RAID-0 SSDSmartPathCap+ En+ Exp=1

though all drives are reported as OK

=> pd all show

Smart Array P840 in Slot 3

   array A

      physicaldrive 2I:4:1 (port 2I:box 4:bay 1, Solid State SATA, 200 GB, OK)

   array B

      physicaldrive 2I:4:2 (port 2I:box 4:bay 2, Solid State SATA, 200 GB, OK)

   array C

      physicaldrive 1I:1:5 (port 1I:box 1:bay 5, SATA, 3 TB, OK)

   array D

      physicaldrive 1I:1:6 (port 1I:box 1:bay 6, SATA, 3 TB, OK)

   array E

      physicaldrive 1I:1:7 (port 1I:box 1:bay 7, SATA, 3 TB, OK)

   array F

      physicaldrive 1I:1:8 (port 1I:box 1:bay 8, SATA, 3 TB, OK)

   array G

      physicaldrive 1I:1:1 (port 1I:box 1:bay 1, SATA, 3 TB, OK)

   array H

      physicaldrive 1I:1:2 (port 1I:box 1:bay 2, SATA, 3 TB, OK)

   array I

      physicaldrive 1I:1:3 (port 1I:box 1:bay 3, SATA, 3 TB, OK)

   array J

      physicaldrive 1I:1:4 (port 1I:box 1:bay 4, SATA, 3 TB, OK)

   array K

      physicaldrive 2I:2:1 (port 2I:box 2:bay 1, SATA, 3 TB, OK)

   array L

      physicaldrive 2I:2:2 (port 2I:box 2:bay 2, SATA, 3 TB, OK)

   array M

      physicaldrive 2I:2:3 (port 2I:box 2:bay 3, SATA, 3 TB, OK)

   array N

      physicaldrive 2I:2:4 (port 2I:box 2:bay 4, SATA, 3 TB, OK)

Related Objects
Search...

Status	Assigned	Task
Resolved	fgiunchedi	T130012 expand swift hardware in codfw/eqiad
		Unknown Object (Task)
Resolved	fgiunchedi	T136631 rack/setup/deploy ms-be102[2-7]
Resolved	fgiunchedi	T140597 diagnose failed(?) sda on ms-be1022

Event Timeline

fgiunchedi created this task.Jul 18 2016, 8:34 AM

Restricted Application added subscribers: Zppix, Southparkfan, Steinsplitter, Aklapper. · View Herald TranscriptJul 18 2016, 8:34 AM

This disk was sent by HP to SF Office despite specifying the shipping address as the data center in Virginia. Robert sent to me but via usps which will be returned to him.

HP is sending a new disk and we will need to return the other disk.

I received the disk and replaced it

root@ms-be1022:~# hpssacli ctrl slot=3 ld all show status

logicaldrive 1 (186.3 GB, 0): OK
logicaldrive 2 (186.3 GB, 0): OK
logicaldrive 3 (2.7 TB, 0): OK
logicaldrive 4 (2.7 TB, 0): OK
logicaldrive 5 (2.7 TB, 0): OK
logicaldrive 6 (2.7 TB, 0): OK
logicaldrive 7 (2.7 TB, 0): OK
logicaldrive 8 (2.7 TB, 0): OK
logicaldrive 9 (2.7 TB, 0): OK
logicaldrive 10 (2.7 TB, 0): OK
logicaldrive 11 (2.7 TB, 0): OK
logicaldrive 12 (2.7 TB, 0): OK
logicaldrive 13 (2.7 TB, 0): OK
logicaldrive 14 (2.7 TB, 0): OK

@fgiunchedi please verify all is well

Closed this by mistake ....supposed to close ms-be1021

New case opened w/HP

Your case was successfully submitted. Please note your Case ID: 5310702226 for future reference.

I replaced the disk on ms-be1022 but it shows up failed and does not appear to rebuild on it's own.

logicaldrive 1 (186.3 GB, 0): Failed

logicaldrive 2 (186.3 GB, 0): OK
logicaldrive 3 (2.7 TB, 0): OK
logicaldrive 4 (2.7 TB, 0): OK
logicaldrive 5 (2.7 TB, 0): OK
logicaldrive 6 (2.7 TB, 0): OK
logicaldrive 7 (2.7 TB, 0): OK
logicaldrive 8 (2.7 TB, 0): OK
logicaldrive 9 (2.7 TB, 0): OK
logicaldrive 10 (2.7 TB, 0): OK
logicaldrive 11 (2.7 TB, 0): OK
logicaldrive 12 (2.7 TB, 0): OK
logicaldrive 13 (2.7 TB, 0): OK
logicaldrive 14 (2.7 TB, 0): OK

Mentioned in SAL [2016-08-03T13:16:19Z] <godog> reboot ms-be1022 - T140597

ok I've reenabled the ld with controller slot=3 ld 1 modify reenable, also had to juggle with boot order since this was sda.

anyways I'm still seeing the iucrc errors

[   24.290524] sd 0:1:0:0: [sda] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[   24.290540] sd 0:1:0:0: [sda] tag#0 Sense Key : Aborted Command [current] 
[   24.290553] sd 0:1:0:0: [sda] tag#0 Add. Sense: Information unit iuCRC error detected
[   24.290559] sd 0:1:0:0: [sda] tag#0 CDB: Read(10) 28 00 02 fd d8 e0 00 01 00 00
[   24.290575] blk_update_request: I/O error, dev sda, sector 50190560
[   24.293699] md/raid1:md0: sda1: rescheduling sector 50122976
[   24.297400] md/raid1:md0: redirecting sector 50122976 to other mirror: sda1

@Cmjohnson looks like this might be also cables/power/backplane, can we try a reseating of the disk and cables? ms-be1022 isn't in service and can be powered off

@Cmjohnson reseated the cables and disk but no change whatsever at reboot, still seeing Information unit iuCRC error detected for sda

[    5.550161] sd 0:1:0:0: [sda] 390651840 512-byte logical blocks: (200 GB/186 GiB)
[    9.674810] sd 0:1:0:0: [sda] 4096-byte physical blocks
[    9.698456] sd 0:1:0:0: [sda] Write Protect is off
[    9.720266] sd 0:1:0:0: [sda] Mode Sense: 73 00 00 08
[    9.720323] sd 0:1:0:0: [sda] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
[    9.762017]  sda: sda1 sda2 sda3 sda4
[    9.779049] sd 0:1:0:0: [sda] Attached SCSI disk
[   15.618279] md: bind<sda1>
[   15.751669] sd 0:1:0:0: [sda] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[   15.790447] sd 0:1:0:0: [sda] tag#0 Sense Key : Aborted Command [current] 
[   15.822514] sd 0:1:0:0: [sda] tag#0 Add. Sense: Information unit iuCRC error detected
[   15.859487] sd 0:1:0:0: [sda] tag#0 CDB: Read(10) 28 00 00 00 00 08 00 00 08 00
[   15.893559] blk_update_request: I/O error, dev sda, sector 8
[   15.953274] md: bind<sda2>
[   16.063310] md: bind<sda4>
[   16.231209] sd 0:1:0:0: [sda] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[   16.269969] sd 0:1:0:0: [sda] tag#0 Sense Key : Aborted Command [current] 
[   16.302253] sd 0:1:0:0: [sda] tag#0 Add. Sense: Information unit iuCRC error detected
[   16.338649] sd 0:1:0:0: [sda] tag#0 CDB: Read(10) 28 00 00 01 09 10 00 00 f0 00
[   16.372943] blk_update_request: I/O error, dev sda, sector 67856
[   16.400992] md/raid1:md0: sda1: rescheduling sector 272
[   16.425375] md/raid1:md0: Disk failure on sda1, disabling device.
[   16.510977]  disk 0, wo:1, o:0, dev:sda1

next step proposed by @Cmjohnson is to swap ssds and see if the error follows the disk, at this point it is unlikely it is the disk itself. More likely cable/controller/psu(?)

The ssds were swapped, the server needs a re-install.

looks like the error followed the swap, now sdb is reported with failed commands

[   24.549309] sd 0:1:0:1: [sdb] tag#1 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[   24.549320] sd 0:1:0:1: [sdb] tag#1 Sense Key : Aborted Command [current] 
[   24.549336] sd 0:1:0:1: [sdb] tag#1 Add. Sense: Information unit iuCRC error detected
[   24.549340] sd 0:1:0:1: [sdb] tag#1 CDB: Read(10) 28 00 00 95 94 80 00 00 80 00
[   24.549342] blk_update_request: I/O error, dev sdb, sector 9802880
[   24.577346] md/raid1:md0: sdb1: rescheduling sector 9735296
[   24.604835] md/raid1:md0: redirecting sector 9735296 to other mirror: sda1

okay, odd that it would be another disk issue but that makes the most
sense. I will request a new one.

Ticket created to replace SSD

Thank you for contacting Hewlett Packard Enterprise for your service request. This email confirms your request for service and the details are below.

Your request is being worked on under reference number 5311440162
STATUS: CASE IS GENERATED AND IN PROGRESS

Product description: HP ProLiant DL380 Gen9 12LFF Configure-to-order Server
Product number: 719061-B21
Serial number: MXQ62108H7
Subject: SCM_HW:Failed SSD

Yours sincerely,
Hewlett Packard Enterprise

New disk has been sent

We would like to inform you that your order for case ID# 5311440162 has shipped. The estimated time of arrival is Thursday, September 01, 2016.

@fgiunchedi the new disk showed and I replaced the one that was producing errors...which was /dev/sdb afaik. Please check and lmk how it goes.

thanks @Cmjohnson ! I've reimaged the machine and it seems fine so far, I'll run some tests and put it in service if no further errors surface

the error is still there at boot for sdb, I tried stressing the disk by writing / reading files to the raid array but no further errors are reported and the disk doesn't get kicked out of the array. Still odd error though, I don't remember seeing this on other machines from the same batch

[   25.087588] sd 0:1:0:1: [sdb] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[   25.087606] sd 0:1:0:1: [sdb] tag#0 Sense Key : Aborted Command [current] 
[   25.087614] sd 0:1:0:1: [sdb] tag#0 Add. Sense: Information unit iuCRC error detected
[   25.087620] sd 0:1:0:1: [sdb] tag#0 CDB: Read(10) 28 00 01 5c 71 00 00 01 00 00
[   25.087624] blk_update_request: I/O error, dev sdb, sector 22835456
[   25.116074] md/raid1:md0: sdb1: rescheduling sector 22767872
[   25.143431] md/raid1:md0: redirecting sector 22767872 to other mirror: sda1

@godog: I want to swap the ssd slots again. I am doing that now...can you reinstall and let me know what the msg logs state. Thanks

Mentioned in SAL [2016-09-09T09:02:00Z] <godog> reimage ms-be1022 - T140597

@Cmjohnson still seeing the same error on sdb, though I noticed it happens only after a reboot. If the server is powered down and then powered back on I don't see the message.

The other thing I noticed is, might it be that when the disks are swapped bays the ordering is kept in the controller? This is what hpssacli shows

=> controller slot=3 pd all show
Smart Array P840 in Slot 3
   array A
      physicaldrive 2I:4:2 (port 2I:box 4:bay 2, Solid State SATA, 200 GB, OK)
   array B
      physicaldrive 2I:4:1 (port 2I:box 4:bay 1, Solid State SATA, 200 GB, OK)
   array C
      physicaldrive 1I:1:5 (port 1I:box 1:bay 5, SATA, 3 TB, OK)
   array D
      physicaldrive 1I:1:6 (port 1I:box 1:bay 6, SATA, 3 TB, OK)
   array E
      physicaldrive 1I:1:7 (port 1I:box 1:bay 7, SATA, 3 TB, OK)
   array F
      physicaldrive 1I:1:8 (port 1I:box 1:bay 8, SATA, 3 TB, OK)
   array G
      physicaldrive 1I:1:1 (port 1I:box 1:bay 1, SATA, 3 TB, OK)
   array H
      physicaldrive 1I:1:2 (port 1I:box 1:bay 2, SATA, 3 TB, OK)
   array I
      physicaldrive 1I:1:3 (port 1I:box 1:bay 3, SATA, 3 TB, OK)
   array J
      physicaldrive 1I:1:4 (port 1I:box 1:bay 4, SATA, 3 TB, OK)
   array K
      physicaldrive 2I:2:1 (port 2I:box 2:bay 1, SATA, 3 TB, OK)
   array L
      physicaldrive 2I:2:2 (port 2I:box 2:bay 2, SATA, 3 TB, OK)
   array M
      physicaldrive 2I:2:3 (port 2I:box 2:bay 3, SATA, 3 TB, OK)
   array N
      physicaldrive 2I:2:4 (port 2I:box 2:bay 4, SATA, 3 TB, OK)

@fgiunchedi I will push HP for a new system board but I am beginning to think that this is not h/w related. I will update once I speak with their tech support

an HP tech came yesterday but once here realized that HP sent him a new backplane for the front disks and not the ssds. He did add a new ssd into slot 13. Once the new part arrives he'll reschedule.

HP Tech came today and replaced the backplane.

@fgiunchedi please check this when you get a chance. Thanks!

@Cmjohnson I'm not seeing the errors above after reimage, taking this and putting the machine in service

following up on T136631

diagnose failed(?) sda on ms-be1022Closed, ResolvedPublicActions

Description

Related ObjectsSearch...

Event Timeline

diagnose failed(?) sda on ms-be1022
Closed, ResolvedPublic
Actions

Related Objects
Search...