If puppet is disabled when one runs the sre.hosts.reboot-single then we end up in a loop where the cookbook is constantly looking for a more recent puppet run. In the cookbook we should add a switch to enable puppet and also exit early if puppet is disabled
Description
Details
Related Objects
- Mentioned Here
- P45904 (An Untitled Masterwork)
Event Timeline
Change 868077 had a related patch set uploaded (by Jbond; author: John Bond):
[operations/cookbooks@master] sre.hosts.reboot-single: add ability to enable host on reboot
Change 868430 had a related patch set uploaded (by Jbond; author: John Bond):
[operations/cookbooks@master] sre.hosts.reboot-single: Simplify icinga logic
Change 868430 merged by jenkins-bot:
[operations/cookbooks@master] sre.hosts.reboot-single: Simplify icinga logic
Change 868077 merged by jenkins-bot:
[operations/cookbooks@master] sre.hosts.reboot-single: add ability to enable host on reboot
Please note we've run into this issue again today:
During the work of sprint week in reimaging hosts, Luca was using the sre.hardware.upgrade-firmware cookbook on kafka-main1005 and ran into an issue where if the puppet agent is disabled on a host (say due to it being taken offline for firmware updates and reimage) the script will fail to reboot it due to that state.
Since this is a normal state for reimaging, this should likely be fixed. example of error: P45904
Current Workaround:
- fire firmware update script, see reboot call fail for puppet being disabled on host
- manually fire reboot of host via cli in a different terminal window, keeping script running
- monitor script for successful firmware update
Adding more context - I needed to stop gracefully kafka on the node and I've disabled puppet to avoid getting the daemon back in running state. Rebooting a kafka node without a graceful shutdown of the kafka daemon is fine but it is better to avoid it if we can do it. Lemme know what's best :)
Change 902009 had a related patch set uploaded (by Elukey; author: Elukey):
[operations/cookbooks@master] sre.hosts.reboot-single.py: replace "pool" with "depool"
Change 902009 merged by Elukey:
[operations/cookbooks@master] sre.hosts.reboot-single.py: replace "pool" with "depool"
Change 902013 had a related patch set uploaded (by Elukey; author: Elukey):
[operations/cookbooks@master] sre.hosts.reboot-single: set self.depool in any case
Change 902013 merged by Elukey:
[operations/cookbooks@master] sre.hosts.reboot-single: set self.depool in any case
Change 902015 had a related patch set uploaded (by Elukey; author: Elukey):
[operations/cookbooks@master] sre.hosts.reboot-single: fix corner case when puppet is disabled
Change 902026 had a related patch set uploaded (by Jbond; author: Jbond):
[operations/cookbooks@master] Revert "sre.hosts.reboot-single: set self.depool in any case"
Change 902015 abandoned by Elukey:
[operations/cookbooks@master] sre.hosts.reboot-single: fix corner case when puppet is disabled
Reason:
Change 902026 merged by jenkins-bot:
[operations/cookbooks@master] Revert "sre.hosts.reboot-single: set self.depool in any case"