Page MenuHomePhabricator

hw troubleshooting: hardware RAID predictive failure for bellatrix.frack.codfw.wmnet
Closed, DeclinedPublicRequest

Description

  • - Provide FQDN of system. bellatrix.frack.codfw.wmnet
  • - If other than a hard drive issue, please depool the machine (and confirm that it’s been depooled) for us to work on it. If not, please provide time frame for us to take the machine down.
  • - Put system into a failed state in Netbox.
  • - Provide urgency of request, along with justification (redundancy, dependencies, etc) "moderate"
  • - Describe issue and/or attach hardware failure log. (Refer to https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook if you need help)
  • - Assign correct project tag and appropriate owner (based on above). Also, please ensure the service owners of the host(s) are added as subscribers to provide any additional input.

Event Timeline

bellatrix:~$ sudo ssacli controller slot=0 physicaldrive 1I:1:1 show detail
[sudo] password for jgreen:

Smart Array P420i in Slot 0 (Embedded)

Array A

   physicaldrive 1I:1:1
      Port: 1I
      Box: 1
      Bay: 1
      Status: Predictive Failure
      Drive Type: Data Drive
      Interface Type: SAS
      Size: 2 TB
      Drive exposed to OS: False
      Logical/Physical Block Size: 512/512
      Rotational Speed: 7200
      Firmware Revision: HPD5
      Serial Number: Z1X3JJKP0000W513G4C5
      WWID: 5000C50062B937F1
      Model: HP      MB2000FCWDF
      Current Temperature (C): 38
      Maximum Temperature (C): 41
      PHY Count: 2
      PHY Transfer Rate: 6.0Gbps, Unknown
      Drive Authentication Status: OK
      Carrier Application Version: 11
      Carrier Bootloader Version: 6
      Sanitize Erase Supported: True
      Unrestricted Sanitize Supported: False
      Shingled Magnetic Recording Support: None

Please note this server is still in service.

@Jgreen this server is out of warranty since 2017 and we have a replacement server already on site that was order in T237440. Please let me know how you want to proceed.

Thanks

No need for this task since we have a replacing server setup on T237440