Page MenuHomePhabricator

helium array has slot 3 disk failed
Closed, ResolvedPublic

Description

After T224794 was resolved, the event handler on the RAID status of helium was never re-enabled, hence me creating this ticket manually. Yesterday, after I noticed that the server was no access to the shelf at all (thankfully the fs went into read-only mode way before that, barring out any fs corruption), I powercycled the system only to discover that slot3 disk was utterly failed

kernel logged on boot:

megaraid_sas 0000:04:00.0: 543495 (621450878s/0x0004/CRIT) - Enclosure PD 0f(c 00/p0) phy bad for slot 3

megacli fully fails to see the drive as /usr/sbin/megacli -PDList -a0 returns the drives in slot2 and slot4 (and all other slots before and after) but not slot 3. This is a RAID6 so we are still fine, but if we have a spare we should replace that drive ASAP.

Related Objects

Event Timeline

jbond triaged this task as Normal priority.Sep 11 2019, 12:10 PM
jbond updated the task description. (Show Details)
jbond assigned this task to Cmjohnson.Sep 11 2019, 12:15 PM
jbond raised the priority of this task from Normal to High.

swapped drive slot 3 @Cmjohnson

@Jclark-ctr thanks. I can confirm that the array is being rebuilt!

Enclosure Device ID: 15
Slot Number: 3
[snip]
Firmware state: Rebuild
[snip]
wiki_willy closed this task as Resolved.Sep 11 2019, 4:31 PM
wiki_willy reassigned this task from Cmjohnson to Jclark-ctr.