Hopefully this would allow us to set VMs to autostart (virsh autostart $vm), without causing more trouble when we reboot hypervisors
https://github.com/libvirt/libvirt/blob/master/tools/libvirt-guests.sysconf#L15
Hopefully this would allow us to set VMs to autostart (virsh autostart $vm), without causing more trouble when we reboot hypervisors
https://github.com/libvirt/libvirt/blob/master/tools/libvirt-guests.sysconf#L15
Change 493807 had a related patch set uploaded (by GTirloni; owner: GTirloni):
[operations/puppet@production] openstack: Automatically start/stop VMs on hypervisor boot/shutdown
I have one question: is this mechanism smart enough to don't start VMs on boot that were in shutoff before the reboot?
EDIT: I saw andrew asked the same question already.
Change 493807 merged by GTirloni:
[operations/puppet@production] openstack: Automatically start/stop VMs on hypervisor boot/shutdown
Change 496778 had a related patch set uploaded (by GTirloni; owner: GTirloni):
[operations/puppet@production] openstack: Only perform VM startup/shutdown on Stretch
Change 496778 merged by GTirloni:
[operations/puppet@production] openstack: Only perform VM startup/shutdown on Stretch
Change is applied and libvirt-guests tries to do its job as expected. However, it fails because the bridge interface fails to come up quick enough:
ar 15 14:39:38 cloudvirt1015 systemd[1]: Starting Suspend/Resume Running libvirt Guests... Mar 15 14:39:39 cloudvirt1015 libvirt-guests.sh[1907]: Resuming guests on default URI... Mar 15 14:39:39 cloudvirt1015 libvirt-guests.sh[1907]: Resuming guest i-00005835: error: Failed to start domain i-00005835 Mar 15 14:39:39 cloudvirt1015 libvirt-guests.sh[1907]: error: Cannot get interface MTU on 'brq7425e328-56': No such device Mar 15 14:39:39 cloudvirt1015 systemd[1]: libvirt-guests.service: Main process exited, code=exited, status=1/FAILURE Mar 15 14:39:39 cloudvirt1015 systemd[1]: Failed to start Suspend/Resume Running libvirt Guests. Mar 15 14:39:39 cloudvirt1015 systemd[1]: libvirt-guests.service: Unit entered failed state. Mar 15 14:39:39 cloudvirt1015 systemd[1]: libvirt-guests.service: Failed with result 'exit-code'.
After a few seconds, brq7425e328-56 becomes available and the VM can be started.
This situation is reported by many people since 2009 but there's no fix in place yet. See https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/495394
So the current situation is:
There should be a way for systemd services to express this dependency and for libvirt to wait until all neutron bridges are setup. Perhaps as simple as a dependency on neutron-linuxbridge-agent, I don't know.
Instead of using virsh autostart, could we let Nova resume the state of VMs after a hypervisor reboot?
This nova config option will provide that functionality
resume_guests_state_on_host_boot = False (Boolean) Whether to start guests that were running before the host rebooted
https://docs.openstack.org/mitaka/config-reference/compute/config-options.html
Mentioned in SAL (#wikimedia-operations) [2019-07-12T15:25:05Z] <jeh> rebooting cloudvirt1018.eqiad.wmnet T216040
Mentioned in SAL (#wikimedia-operations) [2019-07-12T19:08:28Z] <jeh> rebooting cloudvirt1018.eqiad.wmnet T216040
Ran some tests with resume_guests_state_on_host_boot enabled and libvirt-guests configured to not start VMs.
After making the changes we can see that the VMs are getting cleanly shutdown
Jul 12 19:28:14 cloudvirt1018 systemd[1]: Stopping Suspend/Resume Running libvirt Guests... Jul 12 19:28:15 cloudvirt1018 libvirt-guests.sh[4926]: Running guests on default URI: i-000067da, i-00009cdb Jul 12 19:28:15 cloudvirt1018 libvirt-guests.sh[4926]: Shutting down guests on default URI... Jul 12 19:28:15 cloudvirt1018 libvirt-guests.sh[4926]: Starting shutdown on guest: i-000067da Jul 12 19:28:15 cloudvirt1018 libvirt-guests.sh[4926]: Starting shutdown on guest: i-00009cdb Jul 12 19:28:16 cloudvirt1018 libvirt-guests.sh[4926]: Waiting for 2 guests to shut down, 300 seconds left Jul 12 19:28:19 cloudvirt1018 libvirt-guests.sh[4926]: Shutdown of guest i-000067da complete. Jul 12 19:28:19 cloudvirt1018 libvirt-guests.sh[4926]: Shutdown of guest i-00009cdb complete. Jul 12 19:28:19 cloudvirt1018 systemd[1]: Stopped Suspend/Resume Running libvirt Guests.
Even though the VMs are shutdown and the hypervisor is offline, they still show the desired active state in nova.
+--------------------------------------+---------------+--------+----------------------------------------+ | ID | Name | Status | Networks | +--------------------------------------+---------------+--------+----------------------------------------+ | 3e2b47dd-0df2-45bf-9f48-84ca6dc4eaf6 | jeh-cv1018-01 | ACTIVE | lan-flat-cloudinstances2b=172.16.7.100 | | 587fe161-c82e-4d4f-89b4-de5492ec4296 | canary1018-01 | ACTIVE | lan-flat-cloudinstances2b=172.16.3.114 | +--------------------------------------+---------------+--------+----------------------------------------+
When the host comes back online nova restarts the VMs that were previously active.
2019-07-12 19:37:31.465 2004 INFO nova.compute.manager [req-dc605927-b36d-420b-a3bd-9fbc47e843bd - - - - -] [instance: 587fe161-c82e-4d4f-89b4-de5492ec4296] Rebooting instance after nova-compute restart. 2019-07-12 19:37:31.500 2004 INFO nova.virt.libvirt.driver [-] [instance: 587fe161-c82e-4d4f-89b4-de5492ec4296] Instance destroyed successfully. 2019-07-12 19:37:32.317 2004 INFO nova.compute.manager [-] [instance: 587fe161-c82e-4d4f-89b4-de5492ec4296] VM Resumed (Lifecycle Event) 2019-07-12 19:37:32.326 2004 INFO nova.virt.libvirt.driver [-] [instance: 587fe161-c82e-4d4f-89b4-de5492ec4296] Instance rebooted successfully. 2019-07-12 19:37:32.337 2004 INFO nova.compute.manager [req-dc605927-b36d-420b-a3bd-9fbc47e843bd - - - - -] [instance: 3e2b47dd-0df2-45bf-9f48-84ca6dc4eaf6] Rebooting instance after nova-compute restart. 2019-07-12 19:37:32.370 2004 INFO nova.virt.libvirt.driver [-] [instance: 3e2b47dd-0df2-45bf-9f48-84ca6dc4eaf6] Instance destroyed successfully. 2019-07-12 19:37:32.452 2004 INFO nova.compute.manager [req-ce6d37ed-47c8-410b-b1f8-98aa3dcd2563 - - - - -] [instance: 587fe161-c82e-4d4f-89b4-de5492ec4296] VM Started (Lifecycle Event) 2019-07-12 19:37:34.547 2004 INFO nova.compute.manager [req-ce6d37ed-47c8-410b-b1f8-98aa3dcd2563 - - - - -] [instance: 3e2b47dd-0df2-45bf-9f48-84ca6dc4eaf6] VM Resumed (Lifecycle Event) 2019-07-12 19:37:34.557 2004 INFO nova.virt.libvirt.driver [-] [instance: 3e2b47dd-0df2-45bf-9f48-84ca6dc4eaf6] Instance rebooted successfully. 2019-07-12 19:37:34.689 2004 INFO nova.compute.manager [req-ce6d37ed-47c8-410b-b1f8-98aa3dcd2563 - - - - -] [instance: 3e2b47dd-0df2-45bf-9f48-84ca6dc4eaf6] VM Started (Lifecycle Event)
(Note that it's using the same process you'd see with openstack server start and openstack server stop. Destroy sounds scary, but it's actually just a shutdown event.)
Libvirt-guests should only be used to cleanly shutdown VMs that are managed by OpenStack.
Using nova-compute to manage which VMs should be running after a reboot ensures the network dependencies are in place as well as keeping in sync with the desired state within OpenStack.
Change 522548 had a related patch set uploaded (by Jhedden; owner: Jhedden):
[operations/puppet@production] openstack: resume VM state on host reboot
Change 522548 merged by Andrew Bogott:
[operations/puppet@production] openstack: resume VM state on host reboot