Page MenuHomePhabricator

Monitoring: create an alert for daemonized puppet
Closed, ResolvedPublic

Description

We've seen in T166203#3294061 that a simple mistake can leave puppet damonized on the hosts that keep running. To avoid this we should add an alert to monitor and ensure no puppet process is running since more than says 2 hours. The check can be done infrequently, like once per hour or so IMHO.

Having a daemonized puppet on the host conflict with our cron-based puppet runs and generates any sort of flapping issues that are hard to find.

Some example of running puppet were:

/usr/bin/ruby /usr/bin/puppet agent 0tv
/usr/bin/ruby /usr/bin/puppet agent .-t
/usr/bin/ruby /usr/bin/puppet agent -d

Event Timeline

Change 358501 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] base/puppet: add "daemonize = no" to agent config

https://gerrit.wikimedia.org/r/358501

Change 358501 merged by Faidon Liambotis:
[operations/puppet@production] base/puppet: add "daemonize = no" to agent config

https://gerrit.wikimedia.org/r/358501

The change above makes it impossible to have daemonized agents running as root so I 'd say this is resolved and we might be spared to need to have monitoring for this.

Yep, i was hoping for this to be the outcome. I'll call it resolved then. :)

Change 359084 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] base/puppet: use "false" instead of "no" with "daemonize" option

https://gerrit.wikimedia.org/r/359084

uhmm.. i think we have to use "false" instead of "no". Well, the confusion is that the command line parameters are with "no" but in the config file it expects a Boolean (depending on version?).

What happened is I saw that host dumpsdata1001 had a CRIT systemd state in Icinga.

I checked which unit was failed and it was puppet, and i saw "Invalid value '"no"' for boolean parameter: daemonize" .. ugh :/

BUT: I didn't see that on other hosts before and i did test that and it seemed to work and prevented it from daemonizing.

Also, the original "no" was advice directly from puppet people on Puppet in Freenode.
On https://docs.puppet.com/puppet/latest/configuration.html#daemonize there is:

When using boolean settings on the command line, use --setting and --no-setting instead of --setting (true|false). (Using --setting false results in “Error: Could not parse application options: needless argument”.)

so .. https://gerrit.wikimedia.org/r/#/c/359084/

wanna confirm?

Change 359084 merged by Faidon Liambotis:
[operations/puppet@production] base/puppet: use "false" instead of "no" with "daemonize" option

https://gerrit.wikimedia.org/r/359084