Page MenuHomePhabricator

Degraded RAID on ms-be1045
Closed, InvalidPublic

Description

TASK AUTO-GENERATED by Nagios/Icinga RAID event handler

A degraded RAID (md) was detected on host ms-be1045. An automatic snapshot of the current RAID status is attached below.

Please sync with the service owner to find the appropriate time window before actually replacing any failed hardware.

CHECK_NRPE: Error - Could not connect to 10.64.0.139: Connection reset by peer

$ sudo /usr/local/lib/nagios/plugins/get_raid_status_md
Failed to execute '['/usr/lib/nagios/plugins/check_nrpe', '-4', '-H', 'ms-be1045', '-c', 'get_raid_status_md']': RETCODE: 2
STDOUT:
CHECK_NRPE: Error - Could not connect to 10.64.0.139: Connection reset by peer

STDERR:
None

Event Timeline

Dzahn added a subscriber: Dzahn.Dec 13 2018, 10:24 PM

T209618#4819650 shows a stress test was done on these. looks like it got too stressed

I am not seeing anything wrong with the disks

Exit Code: 0x00
cmjohnson@ms-be1045:~$ sudo megacli -PDList -aALL |grep "Firmware state"
Firmware state: Online, Spun Up
Firmware state: Online, Spun Up
Firmware state: Online, Spun Up
Firmware state: Online, Spun Up
Firmware state: Online, Spun Up
Firmware state: Online, Spun Up
Firmware state: Online, Spun Up
Firmware state: Online, Spun Up
Firmware state: Online, Spun Up
Firmware state: Online, Spun Up
Firmware state: Online, Spun Up
Firmware state: Online, Spun Up
Firmware state: Online, Spun Up
Firmware state: Online, Spun Up

Dzahn closed this task as Resolved.Dec 13 2018, 10:41 PM
Dzahn claimed this task.

oh, false alert. the ticket got auto-created because the check failed.

the cause was actually "CHECK_NRPE: Error - Could not connect to 10.64.0.139: Connection reset by peer"

both RAID checks are showing as normal now:

MD RAID

OK 2018-12-13 22:38:52 0d 6h 14m 11s 1/3 OK: Active: 4, Working: 4, Failed: 0, Spare: 0

MegaRAID

OK 2018-12-13 22:36:43 1d 5h 28m 38s 1/3 OK: optimal, 14 logical, 14 physical

Dzahn changed the task status from Resolved to Invalid.Dec 13 2018, 10:41 PM