Page MenuHomePhabricator

Enable support for nested VMs
Closed, ResolvedPublic

Description

Desired outcome: VMs created within WMCS should be able to run the KVM hypervisor.

For some use cases we need the features of containers (rapid creation/destruction and isolation) but without the restrictions that are normally applied to containers (e.g., certain system calls are disabled, etc). For example, in T250808, there is a need to run dockerd and docker commands within a container, but dockerd requires privileges beyond those normally allowed in containers.
Likewise for T266081. The usual workarounds are to either run the container in privileged mode, or to expose the docker socket to the container. Both of those choices effectively give root access to the container which is unacceptable. For T259586 we need to be able to run qemu-system with KVM acceleration for acceptable performance.

Event Timeline

Change 638146 had a related patch set uploaded (by Ahmon Dancy; owner: Ahmon Dancy):
[operations/puppet@production] openstack: Enable support for nested VMs

https://gerrit.wikimedia.org/r/638146

For the context, we once had CI jobs running the Android Emulator and they had access to /dev/kvm. That eventually disappeared at some point.

The relevant OpenStack doc seems to be: https://docs.openstack.org/nova/train/admin/configuration/hypervisor-kvm.html#nested-guest-support . If I have processed it properly:

On the host set kernel option options kvm_intel nested=1.

And for OpenStack:

templates/rocky/nova/compute/nova-compute.conf.erb
  # Define custom CPU restriction to the lowest
  # common subset of features across all hypervisors.
  # Until we decom cloudvirt1001-1009, this is Ivy Bridge.
  cpu_mode=custom
  cpu_model=IvyBridge-IBRS
+ cpu_model_extra_flags = vmx,pcid
aborrero triaged this task as Medium priority.Nov 10 2020, 4:30 PM
aborrero added a subscriber: aborrero.

I think this is a good idea in general. Something nice to support in CloudVPS.

However, will this require to unload/load the kvm module? If so, that would require to drain the hypervisor.
That's a way bigger operation that just merging a patch, even though we have ceph in most of the fleet.

I will discuss this with the WMCS team in the next team meeting, which is scheduled for 2020-11-18, specifically:

  • strategy for testing the change. Probably, hardcode the options in the codfw1dev environment, reboot a hypervisor and see what happens (validate that everything runs fine and the VM sees the correct KVM option)
  • decide how to handle/schedule this operation.

As a workaround if needed and worst case scenario, you can try running bare/non-hardware backed qemu (really slow though :S)

As a workaround if needed and worst case scenario, you can try running bare/non-hardware backed qemu (really slow though :S)

I tested this approach and the performance was totally unacceptable.

update: I'm still waiting to have codfw1dev working properly so I can test things there. Right now our puppetmasters there are messed up (probably my doing) and @jbond is looking at untangling the cert mess there.

I still need to merge and implement this but it should be straightforward -- ping me if you don't hear back by the end of the week.

Change 638146 merged by Andrew Bogott:
[operations/puppet@production] openstack: Enable support for nested VMs

https://gerrit.wikimedia.org/r/638146

This will only be activated as hypervisors are rebooted. The current set of hypervisors that have been rebooted since the patch was merged are:

  • cloudvirt1025
  • cloudvirt1026

@dancy, the easiest workaround for this is for you to create your new VM(s) and then check to see what host they land on. If it's not any of the above, just ping me and I'll move them.

This will only be activated as hypervisors are rebooted. The current set of hypervisors that have been rebooted since the patch was merged are:

  • cloudvirt1025
  • cloudvirt1026

@dancy, the easiest workaround for this is for you to create your new VM(s) and then check to see what host they land on. If it's not any of the above, just ping me and I'll move them.

I started
https://horizon.wikimedia.org/project/instances/ac4b2b46-ed0a-40fa-97d0-184330e9c82d/
It is currently on cloudvirt1023

ok! I moved it to cloudvirt1025; you might want to reboot it before you start your testing. Please let me know how/if it works!

@Andrew It's working! This is great. Thank you very much for getting this change in!

Sounds good. For the next few months at least we'll have to schedule any new uses of this by hand to get them on hardware that supports the feature. Ping me about that as needed.