Page MenuHomePhabricator

Investigate check_nrpe -u option to reduce critical alerts
Closed, ResolvedPublic

Description

Instead of having nrpe dependent alerts to go critical during an nrpe interruption (stopped nrpe daemon, etc.) we should consider using unknown status for this condition via the check_nrpe -u flag.

-u         = Make socket timeouts return an UNKNOWN state instead of CRITICAL

This, in conjunction with an a service check for nrpe itself, could significantly reduce false critical alerts.

Event Timeline

faidon moved this task from Inbox to Up next on the observability board.
faidon subscribed.

Sounds good to me, feel free to go ahead :)

Change 374368 had a related patch set uploaded (by Herron; owner: Herron):
[operations/puppet@production] icinga: add -u option to check_nrpe commands

https://gerrit.wikimedia.org/r/374368

Change 374368 merged by Herron:
[operations/puppet@production] icinga: add -u option to check_nrpe commands

https://gerrit.wikimedia.org/r/374368

This is looking good so far. Going to keep an eye on it for the rest of the day before resolving

herron removed a project: Patch-For-Review.