Page MenuHomePhabricator

Degraded RAID on ms-be2029
Closed, ResolvedPublic

Description

TASK AUTO-GENERATED by Nagios/Icinga RAID event handler

A degraded RAID (hpssacli) was detected on host ms-be2029. An automatic snapshot of the current RAID status is attached below.

Please sync with the service owner to find the appropriate time window before actually replacing any failed hardware.

[6137776.924080] blk_update_request: critical target error, dev sdf, sector 0
[6137776.962785] Buffer I/O error on dev sdf, logical block 0, async page read
[6137777.001323] sd 0:1:0:5: [sdf] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[6137777.001328] sd 0:1:0:5: [sdf] tag#0 Sense Key : Hardware Error [current] 
[6137777.001333] sd 0:1:0:5: [sdf] tag#0 Add. Sense: Logical unit failure
[6137777.001337] sd 0:1:0:5: [sdf] tag#0 CDB: Read(16) 88 00 00 00 00 00 00 00 00 00 00 00 00 08 00 00
[6137777.001339] blk_update_request: critical target error, dev sdf, sector 0
[6137777.039931] Buffer I/O error on dev sdf, logical block 0, async page read
[6137777.079842] sd 0:1:0:5: [sdf] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[6137777.080035] sd 0:1:0:5: [sdf] tag#0 Sense Key : Hardware Error [current] 
[6137777.080424] sd 0:1:0:5: [sdf] tag#0 Add. Sense: Logical unit failure
[6137777.080428] sd 0:1:0:5: [sdf] tag#0 CDB: Read(16) 88 00 00 00 00 00 00 00 00 00 00 00 00 08 00 00
[6137777.080623] blk_update_request: critical target error, dev sdf, sector 0
[6137777.118630] Buffer I/O error on dev sdf, logical block 0, async page read
[6137777.156209] ldm_validate_partition_table(): Disk read failed.
[6137777.156273] sd 0:1:0:5: [sdf] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[6137777.156276] sd 0:1:0:5: [sdf] tag#0 Sense Key : Hardware Error [current] 
[6137777.156281] sd 0:1:0:5: [sdf] tag#0 Add. Sense: Logical unit failure
[6137777.156284] sd 0:1:0:5: [sdf] tag#0 CDB: Read(16) 88 00 00 00 00 00 00 00 00 00 00 00 00 08 00 00
[6137777.156286] blk_update_request: critical target error, dev sdf, sector 0
[6137777.193484] Buffer I/O error on dev sdf, logical block 0, async page read
[6137777.234444] sd 0:1:0:5: [sdf] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[6137777.234991] sd 0:1:0:5: [sdf] tag#0 Sense Key : Hardware Error [current] 
[6137777.235539] sd 0:1:0:5: [sdf] tag#0 Add. Sense: Logical unit failure
[6137777.235901] sd 0:1:0:5: [sdf] tag#0 CDB: Read(16) 88 00 00 00 00 00 00 00 00 00 00 00 00 08 00 00
[6137777.236080] blk_update_request: critical target error, dev sdf, sector 0
[6137777.274732] Buffer I/O error on dev sdf, logical block 0, async page read
[6137777.313264] sd 0:1:0:5: [sdf] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[6137777.313270] sd 0:1:0:5: [sdf] tag#0 Sense Key : Hardware Error [current] 
[6137777.313274] sd 0:1:0:5: [sdf] tag#0 Add. Sense: Logical unit failure
[6137777.313278] sd 0:1:0:5: [sdf] tag#0 CDB: Read(16) 88 00 00 00 00 00 00 00 00 00 00 00 00 08 00 00

Event Timeline

fgiunchedi added subscribers: Papaul, fgiunchedi.

@Papaul please replace, thanks!
Also note that this drive has very few power on hours, almost DOA

=> pd 1I:1:8 show
array f show

Smart Array P840 in Slot 3

   array F

      physicaldrive 1I:1:8
         Port: 1I
         Box: 1
         Bay: 8
         Status: Failed
         Last Failure Reason: Unknown
         Drive Type: Data Drive
         Interface Type: SATA
         Size: 4000.7 GB
         Drive exposed to OS: False
         Native Block Size: 512
         Rotational Speed: 7200
         Firmware Revision: HPG2
         Serial Number: XXXX
         Model: ATA     MB4000GFEMK
         SATA NCQ Capable: True
         SATA NCQ Enabled: True
         Maximum Temperature (C): 37
         PHY Count: 1
         PHY Transfer Rate: Unknown
         Drive Authentication Status: Not Applicable
         Sanitize Erase Supported: False


=> array f show

Smart Array P840 in Slot 3

   Array: F
      Interface Type: SATA
      Unused Space: 0  MB (0.0%)
      Used Space: 3.6 TB (100.0%)
      Status: Failed Physical Drive
      MultiDomain Status: OK
      Array Type: Data       HPE SSD Smart Path: disable

      Warning: One of the drives on this array have failed or has been removed.
Papaul triaged this task as Medium priority.May 24 2017, 12:20 AM

Dear Mr Papaul Tshibamba,

Thank you for contacting Hewlett Packard Enterprise for your service request. This email confirms your request for service and the details are below.

Your request is being worked on under reference number 5319943746
Status: Case is generated and in Progress

Product description: HPE ProLiant DL380 Gen9 12LFF Configure-to-order Server
Product number: 719061-B21
Serial number: MXQ70601RX
Subject: DL380 Gen9 : Failed Physical Drive

Yours sincerely,
Hewlett Packard Enterprise

Disk replacement complete