Page MenuHomePhabricator

Degraded RAID on wasat
Closed, ResolvedPublic

Description

TASK AUTO-GENERATED by Nagios/Icinga RAID event handler

A degraded RAID (md) was detected on host wasat. An automatic snapshot of the current RAID status is attached below.

Please sync with the service owner to find the appropriate time window before actually replacing any failed hardware.

CRITICAL: State: degraded, Active: 5, Working: 5, Failed: 1, Spare: 0

$ sudo /usr/local/lib/nagios/plugins/get_raid_status_md
Personalities : [raid1] 
md2 : active raid1 sda3[0] sdb3[1]
      438449152 blocks super 1.2 [2/2] [UU]
      bitmap: 0/4 pages [0KB], 65536KB chunk

md1 : active (auto-read-only) raid1 sda2[0] sdb2[1]
      976320 blocks super 1.2 [2/2] [UU]
      	resync=PENDING
      
md0 : active raid1 sda1[0](F) sdb1[1]
      48794624 blocks super 1.2 [2/1] [_U]
      
unused devices: <none>

Event Timeline

RobH triaged this task as High priority.May 1 2018, 2:37 PM
RobH subscribed.

This host is under warranty until 2019-03-14, and is an HP DL360.

RobH moved this task from Backlog to Up Next on the ops-codfw board.
RobH added a subscriber: Papaul.

@Papaul: Please go ahead and process a warranty replacement for this disk with HP. If it is how swap (should be) we can replace without downtime.

Dear Mr Papaul Tshibamba,

Thank you for contacting Hewlett Packard Enterprise for your service request. This email confirms your request for service and the details are below.

Your request is being worked on under reference number 5329190939
Status: Case is generated and in Progress

Product description: HPE ProLiant DL360 Gen9 8SFF Configure-to-order Server
Product number: 755258-B21
Serial number: MXQ61007G8
Subject: DL360 Gen9 - Failed hard drive

Yours sincerely,
Hewlett Packard Enterprise

Dear Papaul Tshibamba,

We are contacting you in regards to your case ID# 5329190939.

Please be aware that a functional equivalent part (656108-001) (SPS-DRV HD 1TB 6G SATA 7.2K 2.5 MDL SC) has shipped as an alternate material solution for Original 656107-001.

This replacement part is expected to perform the same functions as the original part, and its functionality has been tested and approved by our support specialists.

IMPORTANT: This is an automatically generated email, please do not reply. Should you need further assistance, please address your email to easitool.follow-up@hpe.com or call 1-800-548-73

@RobH disk replacement complete

The RAID still shows as degraded -- @RobH -or someone else- could you have a look? Thanks!

Volans subscribed.

I just discovered that this host is planned for reimage in the next few days, not bothering fixing the md array as the host is not seeing the replaced disk and might need anyway a reboot, going directly for the reimage at this point.

@Volans: Are you handling the reimagine? This host is still email spamming about the defunct disk.

There is also related/duplicate task T197562,

@RobH No, not in my plate, I was told it was about to be reimaged, I think that @MoritzMuehlenhoff and @elukey might have more info about this.

So the new disk is showing up as /dev/sdc now, presumably a reimage would straighten everything out.

MoritzMuehlenhoff claimed this task.

This host got reimaged with stretch (and renamed to mwmaint2001), so this is resolved.