Page MenuHomePhabricator

Degraded RAID on ms-be1038
Closed, ResolvedPublic

Description

TASK AUTO-GENERATED by Nagios/Icinga RAID event handler

A degraded RAID (md) was detected on host ms-be1038. An automatic snapshot of the current RAID status is attached below.

Please sync with the service owner to find the appropriate time window before actually replacing any failed hardware.

CRITICAL: State: degraded, Active: 2, Working: 2, Failed: 2, Spare: 0

$ sudo /usr/local/lib/nagios/plugins/get-raid-status-md
Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [raid10] 
md0 : active raid1 sdb1[1](F) sda1[0]
      58559488 blocks super 1.2 [2/1] [U_]
      
md1 : active raid1 sdb2[1] sda2[0](F)
      976320 blocks super 1.2 [2/1] [_U]
      
unused devices: <none>

Event Timeline

Looks like the host is busted, I'll try a reboot

Debian GNU/Linux 9 auto-installed on Thu Jul 13 14:37:19 UTC 2017.
-bash: /usr/bin/lesspipe: Input/output error
-bash: /usr/bin/tput: Input/output error
-bash: /usr/bin/tput: Input/output error
-bash: /usr/bin/tput: Input/output error
-bash: /usr/bin/tput: Input/output error
Connection to ms-be1038.eqiad.wmnet closed.

Message at boot up

Slot 3 Port 1 : Smart Array P840 Controller - (4096 MB, V4.52) 14 Logical
Drive(s) - Operation Failed
 - 1719-Slot 3 Drive Array - A controller failure event occurred prior
   to this power-up.  (Previous lock up code = 0x13) Action: Install the
   latest controller firmware. If the problem persists, replace the
   controller.
fgiunchedi claimed this task.

RAID firmware upgraded and host rebooted 2x, we're back