Page MenuHomePhabricator

ms-be1021.eqiad.wmnet: slot=1I:1:2 dev=sdh failed
Closed, ResolvedPublic

Description

slot=1I:1:2 dev=sdh has been reported failed, please replace.

hpssacli

=> set target controller slot=3 pd 1I:1:2

   "controller slot=3 physicaldrive 1i:1:2"

=> show   

Smart Array P840 in Slot 3

   array H

      physicaldrive 1I:1:2
         Port: 1I
         Box: 1
         Bay: 2
         Status: Failed
         Last Failure Reason: Hardware error
         Drive Type: Data Drive
         Interface Type: SATA
         Size: 4000.7 GB
         Native Block Size: 512
         Rotational Speed: 7200
         Firmware Revision: HPG3
         Model: ATA     MB4000GDUPB
         SATA NCQ Capable: True
         SATA NCQ Enabled: True
         Maximum Temperature (C): 42
         PHY Count: 1
         PHY Transfer Rate: 6.0Gbps
         Drive Authentication Status: Not Applicable
=> set target controller slot=3 array h

   "controller slot=3 array H"

=> delete

Warning: Deleting an array can cause other array letters to become renamed.
         E.g. Deleting array A from arrays A,B,C will result in two remaining
         arrays A,B ... not B,C


Warning: Deleting the specified device(s) will result in data being lost.
         Continue? (y/n) y

Event Timeline

fgiunchedi renamed this task from ms-be1012.eqiad.wmnet: slot=1I:1:2 dev=sdh to ms-be1012.eqiad.wmnet: slot=1I:1:2 dev=sdh failed.Jul 8 2016, 4:40 PM
fgiunchedi renamed this task from ms-be1012.eqiad.wmnet: slot=1I:1:2 dev=sdh failed to ms-be1021.eqiad.wmnet: slot=1I:1:2 dev=sdh failed.Jul 14 2016, 4:13 PM

case opened

ase ID: 5310374258
Case title:
Failed Hard Drive
Severity 2-Critical Degraded
Customer tracking number: be1021

Product serial number: MXQ54101my
Product number: 719061-B21
Submitted: 7/14/2016 12:23:04 PM
Last updated: 7/14/2016 12:23:04 PM
Source: Web
Case status: Received by HP

fgiunchedi triaged this task as Medium priority.Jul 15 2016, 2:42 PM
Cmjohnson claimed this task.

Received the disk and swapped it out w/old one. @fgiunchedi

fgiunchedi claimed this task.

thanks @Cmjohnson ! reopening and assigning to me.
I made a mistake by removing the LD because now the others will get renumbered and so will the disks from linux perspective.

So action plan:

  • set sdh / sdi / sdj / sdk / sdl / sdm / sdn weight to 0 in swift
  • wait for rebalance
  • remove sdi / sdj / sdk / sdl / sdm / sdn LDs too and recreate all LDs
  • add back weight for all sdh / sdi / sdj / sdk / sdl / sdm / sdn

Return part UPS tracking numbers picture attached{F4312382}

Mentioned in SAL [2016-08-31T11:39:38Z] <godog> swift eqiad-prod: set weight for ms-be1021 sd[h-n] to 0 - T139767

Mentioned in SAL (#wikimedia-operations) [2016-11-08T00:50:59Z] <godog> swift eqiad-prod: set weight for ms-be1021 sd[h-n] to 3000 - T139767

ring rebalanced and all disks in service