Page MenuHomePhabricator

Decide how to run a test involving docker inside WMF CI
Closed, ResolvedPublic

Description

... specifically for the use case of Fresh, but possibly also more generally which might inform how we'd run testing short-mid term for Blubber, Beaker test for operations/puppet, and anything that either needs to run Docker itself or needs its own VM for other reasons.

Some things to think about:

  • Docker-in-Docker (D-in-D): Probably not feasible. Basically requires giving the executable access to the host. Either by granting the first container root (so that the second one can be spawned inside), or by granting the first container access to the host docker socket so that it can spawn the second one as a sibling, directly on the host.
  • Let the Jenkins job create a VM (using Qemu/KVM), and then run git-clone and test command inside that. (The VM would resume from a snapshot with docker, git, etc. pre-installed).

Event Timeline

(The VM would resume from a snapshot with docker, git, etc. pre-installed).

I suppose we can do without that for now. Just installing what we need within the job run-time using apt-get. That will be a bit slower, but should be fine for the MVP.

Let the Jenkins job create a VM using Qemu/KVM […]

Based on current CI worker provisioning from puppet, the qemu-system-x86_64 command is not available, but qemu-img is. It seems similar but I'm not sure. In any event, I suppose we could easily add that to the puppet role via apt if needed.

These naturally need an OS base image, so I guess the next step would be to decide which one to use and where to get it from and/or where to store it.

@hashar wrote (IRC):

Mentioned in SAL (#wikimedia-releng) [2020-04-21T14:48:11Z] <Krinkle> Creating integration-agent-qemu-1001 to experiment with VM-based CI jobs – T250808

integration-agent-qemu-1001:~$ qemu-system-x86_64 -enable-kvm
 Could not access KVM kernel module: No such file or director
 failed to initialize KVM: No such file or directory

Per @LarsWirzenius, for it to be not as slow, the kvm_intel kernel module is needed. E.g. by creating /etc/modprobe.d/kvm-nested.conf with (source):

options kvm-intel nested=1
options kvm-intel enable_shadow_vmcs=1
options kvm-intel enable_apicv=1
options kvm-intel ept=1

We'd need that to be enabled on the openstack host as well, which it seems to not be right now. For now though we can proceed without KVM, in userland.

Note that integration-agent-qemu-1001.integration.eqiad.wmflabs does not have /dev/kvm. I tried earlier to load the kvm and kvm_intel kernel modules but:

$ sudo modprobe kvm-intel
modprobe: ERROR: could not insert 'kvm_intel': Operation not supported

dmesg reports it is not available:

kvm: no hardware support

None of the other WMCS instances I have access to have /dev/kvm. I guess that is explicitly disabled on the hosts?

For the record, nested virtualization (running vm in vm) was definitely available at some point. We used it to run the Android Emulator.

An alternative is to have those VM based jobs to run elsewhere than on the WMCS infrastructure.

The host needs to enable nested KVM (nested=1 parameter to the kvm module), and then the guest should be able to load the kvm module as well. Or at least that's what I did on my own hardware.

Change 593012 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[fresh@master] bin: Ignore non-zero exit for tput color codes

https://gerrit.wikimedia.org/r/593012

Change 593034 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[integration/config@master] [WIP] Draft Jenkins job for running tests inside a sub-VM

https://gerrit.wikimedia.org/r/593034

Change 593034 merged by jenkins-bot:
[integration/config@master] Create a Jenkins job for Fresh that runs tests inside a Qemu VM

https://gerrit.wikimedia.org/r/593034

Change 595026 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[integration/config@master] Enable Qemu VM job for Fresh

https://gerrit.wikimedia.org/r/595026

Change 595026 merged by jenkins-bot:
[integration/config@master] Enable Qemu VM job for Fresh

https://gerrit.wikimedia.org/r/595026

@hashar Looks like it worked when I triggered the builds manually, but not via Zuul. The issue seems to be that default parameters does not work? If I search online, I see this is a common complaint. It seems this feature only applies to the input field of the "Build" web page on Jenkins. It does not apply when Zuul is triggering the build.

But when clicking "Rebuild Last" on the web interface, it works, given that that will consider the default input value for QEMU_TEST_COMMAND.

The idea is that fresh-test (and other jobs that might need Qemu) are the same, except for a different value for the QEMU_TEST_COMMAND parameter. Ideally the command would be defined together with the job, in jjb/misc.yaml so that it can be maintained together.

  • I tried using paramets: string: default in https://gerrit.wikimedia.org/r/593034, but this does not work per the above reason.
  • I tried adding a one-line shell before the main one, to export FOO=123 but that does not work either. It is not retained to the next build step.
  • I tried maybe defining in in zuul/layout.yaml in the section where fresh and fresh-test are declared, however there does not seem to be a way to set job parameters there directly. There is also not a way to set any extra meta data on a per-project level.
  • I tried using the job section in zuul/layout.yaml, but that only supports setting parameter_function. It does not have a way to e.g. pass a key-value pair like "qemu command: test.sh" to the parameter function.

Is really the only way to set a parameter (simple, static, not variable) for a Jenkins job to add an if-statement in zuul/parameter_function and hardcode the job name there?

Change 595080 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[integration/config@master] Remove use of Qemu build parameter

https://gerrit.wikimedia.org/r/595080

Change 595080 merged by jenkins-bot:
[integration/config@master] Remove use of Qemu build parameter

https://gerrit.wikimedia.org/r/595080

Krinkle claimed this task.
Krinkle added a project: Performance-Team.

Change 593012 merged by jenkins-bot:
[fresh@master] bin: Ignore non-zero exit for tput color codes

https://gerrit.wikimedia.org/r/593012