Page MenuHomePhabricator

Degraded RAID on elastic1039
Closed, ResolvedPublic

Description

TASK AUTO-GENERATED by Nagios/Icinga RAID event handler

A degraded RAID (md) was detected on host elastic1039. An automatic snapshot of the current RAID status is attached below.

Please sync with the service owner to find the appropriate time window before actually replacing any failed hardware.

CRITICAL: State: degraded, Active: 3, Working: 3, Failed: 1, Spare: 0

$ sudo /usr/local/lib/nagios/plugins/get-raid-status-md
Personalities : [raid0] [raid1] [linear] [multipath] [raid6] [raid5] [raid4] [raid10] 
md1 : active raid0 sda2[0] sdb2[1]
      1503967232 blocks super 1.2 512k chunks
      
md0 : active raid1 sda1[0](F) sdb1[1]
      29279232 blocks super 1.2 [2/1] [_U]
      
unused devices: <none>

Event Timeline

Mentioned in SAL (#wikimedia-operations) [2021-06-29T02:34:31Z] <ryankemper> T285643 Banned elastic1039 from all 3 elasticsearch clusters and set elastic1039.eqiad.wmnet to failed in netbox

Mentioned in SAL (#wikimedia-operations) [2021-07-20T13:14:32Z] <gehel> set/pooled=inactive on elastic1039 - disk failure - T285643