Work around https://tickets.puppetlabs.com/browse/PUP-1070 (fixed in 3.7.0, our precise/trusty hosts are on 3.4.3 - only jessie is okay) - maybe <ori> by provisioning, say, a script to rm /var/lib/puppet/state/agent_catalog_run.lock on boot, or a cron job that removes it if there are no puppet processes running
Description
Event Timeline
Automatically cleaning up such files makes me feel uneasy. In Labs, I've seen this scenario (as I understand it) only when instances froze or were rebooted during a Puppet run. Shinken and puppetalert.py will inform the project administrators about Puppet staleness, and the freeze/reboot is usually fresh in memory so it the reason is obvious when looking at the lock file. I wouldn't mind if the clean-up remained a manual task.
could we hook into molly-guard? That already runs when a user tries to reboot the machine and makes you type the hostname to confirm you are sure. Maybe that could stop puppet before the reboot? Or at least tell the user to do that.
molly-guard only works interactively, and I'm not sure if the problems are connected to a "regular" shutdown (i. e., calling /usr/sbin/shutdown) or if OpenStack just pulls the VM's power plug.
Anyhow, removing the file after (re-)boot could be done via an Upstart job. But I'm still not convinced that automatically doing that is a good idea.
I don't remember experiencing this very often in production, anyways removing the file on boot if it exists seems easy enough