Write the description below
From alertmanager:
Icinga/Check for VMs leaked by the nova-fullstack test summary: 7 instances in the admin-monitoring project
From alertmanager:
Icinga/Check for VMs leaked by the nova-fullstack test summary: 7 instances in the admin-monitoring project
Change 714722 had a related patch set uploaded (by David Caro; author: David Caro):
[operations/puppet@production] nova_fullstack: rephrase log message
Change 714733 had a related patch set uploaded (by David Caro; author: David Caro):
[operations/puppet@production] nova_fullstack: Add last error output when timing out puppet check
I started a new VM with the same image, and while booting (before the first puppet run), I connected to the virsh
console and was able to print the puppet config, showing that the state dir is not the one we are looking at
(/var/lib/puppet/state):
agent_catalog_run_lockfile = /var/cache/puppet/state/agent_catalog_run.lock agent_disabled_lockfile = /var/cache/puppet/state/agent_disabled.lock classfile = /var/cache/puppet/state/classes.txt graphdir = /var/cache/puppet/state/graphs lastrunfile = /var/cache/puppet/state/last_run_summary.yaml lastrunreport = /var/cache/puppet/state/last_run_report.yaml resourcefile = /var/cache/puppet/state/resources.txt statedir = /var/cache/puppet/state statefile = /var/cache/puppet/state/state.yaml statettl = 2764800
So my current hypothesis is that the first puppet run changes the state dir, but it's not until the second that it uses
that new path to store the state, and that depends on the cron getting triggered.
And sometimes that's too long and the test just times out.
I'll adapt the script to look in both places (as if the above is correct, even using the 'puppet config print' will
show the wrong path after the first run).
Change 714761 had a related patch set uploaded (by David Caro; author: David Caro):
[operations/puppet@production] nova_fullstack: try to get the puppet state from a couple places
Change 714722 merged by Andrew Bogott:
[operations/puppet@production] nova_fullstack: rephrase log message
The main curse on VM creation these days is the puppet-agent. Cloud-init starts puppet agent (no-optionally) and then the puppet-agent may or may not start a puppet sync while the firstboot script is running.
That race causes no end of headaches, so /probably/ that is still what's causing this. I'm going to look at that next.
Change 714831 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):
[operations/puppet@production] nova vendor-data: another mild attempt to avoid races with the puppet agent
Change 714831 merged by Andrew Bogott:
[operations/puppet@production] nova vendor-data: another mild attempt to avoid races with the puppet agent
@Andrew handing it over to you, as it's not clear to me if you tried the other patches or not (the ones about checking different puppet state file paths), feel free to abandon them if they are not needed and close the task.
Change 714733 merged by Andrew Bogott:
[operations/puppet@production] nova_fullstack: Add last error output when timing out puppet check
Change 714761 merged by Andrew Bogott:
[operations/puppet@production] nova_fullstack: try to get the puppet state from a couple places
Change 715026 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):
[operations/puppet@production] Added cloud-wide default for profile::debdeploy::client::filter_services:
Change 715026 merged by Andrew Bogott:
[operations/puppet@production] Added cloud-wide default for profile::debdeploy::client::filter_services:
Change 714858 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):
[operations/puppet@production] Revert \"nova_fullstack: try to get the puppet state from a couple places\"
Change 714858 merged by Andrew Bogott:
[operations/puppet@production] Revert \"nova_fullstack: try to get the puppet state from a couple places\"
Change 715045 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):
[operations/puppet@production] nova vendordata: try to have cloud-init perform the first puppet run
Change 715050 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):
[operations/puppet@production] nova_fullstack_test.py: capture output on succesfull puppet check
Change 715050 merged by Andrew Bogott:
[operations/puppet@production] nova_fullstack_test.py: capture output on succesful puppet check
Change 715045 merged by Andrew Bogott:
[operations/puppet@production] nova vendordata: try to have cloud-init perform the first puppet run
For an extremely long-run fix to our cloud-init race: https://github.com/canonical/cloud-init/pull/1002
This particular issue should be resolved, not going to keep this open on that upstream task since we won't be able to deploy it until after bullseye.