Page MenuHomePhabricator

Degraded RAID on cloudcephosd1018
Closed, ResolvedPublic

Description

TASK AUTO-GENERATED by Nagios/Icinga RAID event handler

A degraded RAID (md) was detected on host cloudcephosd1018. An automatic snapshot of the current RAID status is attached below.

Please sync with the service owner to find the appropriate time window before actually replacing any failed hardware.

CHECK_NRPE: Error - Could not connect to 10.64.20.15. Check system logs on 10.64.20.15

$ sudo /usr/local/lib/nagios/plugins/get-raid-status-md
Failed to execute '['/usr/lib/nagios/plugins/check_nrpe', '-4', '-H', 'cloudcephosd1018', '-c', 'get_raid_status_md']': RETCODE: 2
STDOUT:
b'CHECK_NRPE: Error - Could not connect to 10.64.20.15: Connection reset by peer\n'
STDERR:
None

Event Timeline

Added the relation with the other one to keep track, but please redo to whatever workflow you prefer (maybe just commenting, inverting the parent-child, another task...).

@Cmjohnson any updates on this?

As far as I can see everything is ok in that machine:

root@cloudcephosd1018:~# cat /proc/mdstat
Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [raid10]
md0 : active raid1 sda2[0] sdb2[1]
      234005504 blocks super 1.2 [2/2] [UU]
      bitmap: 1/2 pages [4KB], 65536KB chunk

unused devices: <none>

The icinga link shows green too.

Let me know if you need to do anything or if we can go on and use the server.

Thanks!

@dcaro, sorry for the late response, I was out all month. No, there isn't anything left to do, it appears to be working fine now. If it breaks again please re-open the task.