Maniphest T193394

Degraded RAID on wasat
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	ops-monitoring-bot
	Apr 30 2018, 1:02 PM

Description

TASK AUTO-GENERATED by Nagios/Icinga RAID event handler

A degraded RAID (md) was detected on host wasat. An automatic snapshot of the current RAID status is attached below.

Please sync with the service owner to find the appropriate time window before actually replacing any failed hardware.

CRITICAL: State: degraded, Active: 5, Working: 5, Failed: 1, Spare: 0

$ sudo /usr/local/lib/nagios/plugins/get_raid_status_md
Personalities : [raid1] 
md2 : active raid1 sda3[0] sdb3[1]
      438449152 blocks super 1.2 [2/2] [UU]
      bitmap: 0/4 pages [0KB], 65536KB chunk

md1 : active (auto-read-only) raid1 sda2[0] sdb2[1]
      976320 blocks super 1.2 [2/2] [UU]
      	resync=PENDING
      
md0 : active raid1 sda1[0](F) sdb1[1]
      48794624 blocks super 1.2 [2/1] [_U]
      
unused devices: <none>

Related Objects

Mentioned In: T197562: Replace disk on wasat
Mentioned Here: T197562: Replace disk on wasat

Event Timeline

ops-monitoring-bot added projects: ops-codfw, SRE.Apr 30 2018, 1:02 PM

ops-monitoring-bot subscribed.

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptApr 30 2018, 1:02 PM

Volans added subscribers: elukey, Joe.Apr 30 2018, 1:21 PM

This host is under warranty until 2019-03-14, and is an HP DL360.

MoritzMuehlenhoff subscribed.May 2 2018, 8:07 AM

@Papaul: Please go ahead and process a warranty replacement for this disk with HP. If it is how swap (should be) we can replace without downtime.

Dear Mr Papaul Tshibamba,

Thank you for contacting Hewlett Packard Enterprise for your service request. This email confirms your request for service and the details are below.

Your request is being worked on under reference number 5329190939
Status: Case is generated and in Progress

Product description: HPE ProLiant DL360 Gen9 8SFF Configure-to-order Server
Product number: 755258-B21
Serial number: MXQ61007G8
Subject: DL360 Gen9 - Failed hard drive

Yours sincerely,
Hewlett Packard Enterprise

Dear Papaul Tshibamba,

We are contacting you in regards to your case ID# 5329190939.

Please be aware that a functional equivalent part (656108-001) (SPS-DRV HD 1TB 6G SATA 7.2K 2.5 MDL SC) has shipped as an alternate material solution for Original 656107-001.

This replacement part is expected to perform the same functions as the original part, and its functionality has been tested and approved by our support specialists.

IMPORTANT: This is an automatically generated email, please do not reply. Should you need further assistance, please address your email to easitool.follow-up@hpe.com or call 1-800-548-73

@RobH disk replacement complete

The RAID still shows as degraded -- @RobH -or someone else- could you have a look? Thanks!

Volans merged a task: T195339: Degraded RAID on wasat.May 24 2018, 5:03 PM

Volans claimed this task.May 24 2018, 5:06 PM

I just discovered that this host is planned for reimage in the next few days, not bothering fixing the md array as the host is not seeing the replaced disk and might need anyway a reboot, going directly for the reimage at this point.

• Vvjjkkii renamed this task from Degraded RAID on wasat to 9ydaaaaaaa.Jul 1 2018, 1:12 AM

• Vvjjkkii added projects: CheckUser, Connected-Open-Heritage-Batch-uploads (RAÄ-KMB_1_2017-02), Tamil-Sites, Gamepress, Hashtags, Jade, KartoEditor, Language-2018-Apr-June, New-Editor-Experiences, Mail, TCB-Team (now WMDE-TechWish).

• Vvjjkkii updated the task description. (Show Details)

• Vvjjkkii removed a subscriber: Aklapper.

elukey renamed this task from 9ydaaaaaaa to Degraded RAID on wasat.Jul 2 2018, 6:28 AM

elukey removed projects: TCB-Team (now WMDE-TechWish), Mail, New-Editor-Experiences, Language-2018-Apr-June, KartoEditor, Jade, Hashtags, Gamepress, Tamil-Sites, Connected-Open-Heritage-Batch-uploads (RAÄ-KMB_1_2017-02), CheckUser.

elukey updated the task description. (Show Details)

Papaul mentioned this in T197562: Replace disk on wasat.Jul 9 2018, 2:50 PM

@Volans: Are you handling the reimagine? This host is still email spamming about the defunct disk.

There is also related/duplicate task T197562,

@RobH No, not in my plate, I was told it was about to be reimaged, I think that @MoritzMuehlenhoff and @elukey might have more info about this.

So the new disk is showing up as /dev/sdc now, presumably a reimage would straighten everything out.

This host got reimaged with stretch (and renamed to mwmaint2001), so this is resolved.

Degraded RAID on wasatClosed, ResolvedPublicActions

Description

Related Objects

Event Timeline

Degraded RAID on wasat
Closed, ResolvedPublic
Actions