Page MenuHomePhabricator

icinga (neon) is out of CPU headroom
Closed, DuplicatePublic

Description

Neon is routinely at 0% idle CPU when looking at realtime info. Even after manually turning down some of the most CPU-expensive checks, the remaining load still routinely spiked the machine to 0% idle CPU, so in its normal config it's definitely well past the line.

Also, the main icinga process itself is single-threaded and routinely locks up a single CPU core, effectively running out of processing power to keep up with its own demands, even if other cores are idle.

I noticed this while debugging the intermittent ipv6 monitor failures. I really don't know if this is casual or even related, but I figure solving this basic issue seems prudent...

Event Timeline

BBlack raised the priority of this task from to Needs Triage.
BBlack updated the task description. (Show Details)
BBlack added a project: acl*sre-team.
BBlack subscribed.
jcrespo triaged this task as Medium priority.Sep 7 2015, 2:04 PM
jcrespo subscribed.

@BBlack, in your opinion, is this something that should be handled in hardware or in software, having into account cost effectiveness, or did you not investigate enough to have a say?

This probably should be "high", but setting it to normal because this is true for a year already, and no immediate bad effects have been shown yet.

We already discussed this on Ops meeting, the immediate solution would be in hardware, or maybe splitting it into 2 servers.