Page MenuHomePhabricator

Reboot during puppet run causes /var/lib/puppet/state/agent_catalog_run.lock to be left and puppet to not start running again
Closed, ResolvedPublic

Description

Work around https://tickets.puppetlabs.com/browse/PUP-1070 (fixed in 3.7.0, our precise/trusty hosts are on 3.4.3 - only jessie is okay) - maybe <ori> by provisioning, say, a script to rm /var/lib/puppet/state/agent_catalog_run.lock on boot, or a cron job that removes it if there are no puppet processes running

Event Timeline

Automatically cleaning up such files makes me feel uneasy. In Labs, I've seen this scenario (as I understand it) only when instances froze or were rebooted during a Puppet run. Shinken and puppetalert.py will inform the project administrators about Puppet staleness, and the freeze/reboot is usually fresh in memory so it the reason is obvious when looking at the lock file. I wouldn't mind if the clean-up remained a manual task.

could we hook into molly-guard? That already runs when a user tries to reboot the machine and makes you type the hostname to confirm you are sure. Maybe that could stop puppet before the reboot? Or at least tell the user to do that.

molly-guard only works interactively, and I'm not sure if the problems are connected to a "regular" shutdown (i. e., calling /usr/sbin/shutdown) or if OpenStack just pulls the VM's power plug.

Anyhow, removing the file after (re-)boot could be done via an Upstart job. But I'm still not convinced that automatically doing that is a good idea.

fgiunchedi triaged this task as Medium priority.Apr 27 2016, 3:27 PM
fgiunchedi subscribed.

I don't remember experiencing this very often in production, anyways removing the file on boot if it exists seems easy enough

fgiunchedi claimed this task.

This was eventually resolved, we're running puppet 3.8 even on trusty nowadays