Page MenuHomePhabricator

icinga: cant monitor some instances
Closed, ResolvedPublic

Description

Some instances of the deployment-prep projects are not monitored by Icinga:

http://icinga.wmflabs.org/cgi-bin/icinga/status.cgi?hostgroup=deployment-prep&style=detail

The NRPE daemon is listening on port 5666.

The project has a default security rule to allow 5666 from 10.4.0.0/21.


Version: unspecified
Severity: normal

Details

Reference
bz46026

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 1:39 AM
bzimport added a project: VPS-Projects.
bzimport set Reference to bz46026.

damian wrote:

It seems nrpe does not restart as expected - basically the process doesn't quit so it never really restarts.

Since the IP of monitoring changed the config has updated, but the service is running with the old IP.

To resolve run killall nrpe; /etc/init.d/nagios-nrpe-server start on the instances; I'm trying to get Ryan to run this labs-wide via salt to clean up the currently alerting ones.

yea, sounds like a problem we had in production before. nagios-nrpe-server would have issues restarting correctly. Looked to me though as this was resolved after the switch to Icinga (we cleaned up, incl. getting rid of an old init script for nrpe server). In the past we attempted to fix that by adding a sleep command to the init script.

root@virt0:~# salt '*' cmd.run "killall nrpe; /etc/init.d/nagios-nrpe-server start"

killed and restarted on all instances

Works for me now :-] Thanks Daniel!