Page MenuHomePhabricator

Provide one or more Qemu agents in CI that use a newer version than 2.x
Open, MediumPublic

Event Timeline

Krinkle triaged this task as Medium priority.
Krinkle moved this task from Inbox to Backlog: Maintenance on the Performance-Team board.

I created this with @LarsWirzenius as part of T250808, a nd we documented the provisoning steps at https://www.mediawiki.org/wiki/Continuous_integration/Qemu.

However, what we did not document is how the Debian base image for Qemu itself was made. This is something Lars made and uploaded for us, but I'm not what considerations and configurations went into that.

Mentioned in SAL (#wikimedia-releng) [2021-09-03T23:02:41Z] <Krinkle> Creating integration-agent-qemu-1002 (Debian 11 Bullseye, g3.cores8.ram24.disk20.ephemeral40.4xiops), ref T284774

Change 717687 had a related patch set uploaded (by Krinkle; author: Krinkle):

[operations/puppet@production] ci: Add 'bulleye' to docker lsbdistcodename hack

https://gerrit.wikimedia.org/r/717687

The next hurdle:

Notice: /Stage[main]/Labs_lvm/Exec[create-volume-group]/returns: /usr/local/sbin/make-instance-vg: lvm is not active on this host; unable to create a volume.
Error: '/usr/local/sbin/make-instance-vg '/dev/sda'' returned 1 instead of one of [0]
Error: /Stage[main]/Labs_lvm/Exec[create-volume-group]/returns: change from 'notrun' to ['0'] failed: '/usr/local/sbin/make-instance-vg '/dev/sda'' returned 1 instead of one of [0]
Info: Class[Labs_lvm]: Unscheduling all events on Class[Labs_lvm]
Notice: /Stage[main]/Profile::Labs::Lvm::Srv/Labs_lvm::Volume[second-local-disk]/Exec[available-space-second-local-disk]/returns: Traceback (most recent call last):
Notice: /Stage[main]/Profile::Labs::Lvm::Srv/Labs_lvm::Volume[second-local-disk]/Exec[available-space-second-local-disk]/returns:   File "/usr/local/sbin/pv-free", line 17, in <module>
Notice: /Stage[main]/Profile::Labs::Lvm::Srv/Labs_lvm::Volume[second-local-disk]/Exec[available-space-second-local-disk]/returns:     assert pvfree.endswith("G")

Change 717732 had a related patch set uploaded (by Krinkle; author: Krinkle):

[operations/puppet@production] ci: Fix profile::ci to be compatible with new empheral storage

https://gerrit.wikimedia.org/r/717732

The switch from lvm to the new "ephemeral" (cinder-but-not-really-cinder) didn't quite work out because for some reason the space was already mounted at /mnt by default, which isn't meant to happen on new instances. But, that's a bug filed at T290372.

There was also a bug that even after unmounting this, cinderutils still wasn't able to discover the information it needed from the Puppet "facts".

For both of issues, @Bstorm worked her magic to make this work for the qemu-1002 instance specifically.

Signing back over to @dpifke. The integration-agent-qemu-1002 instance should be ready now with the same resources and provisioning as qemu-1001. (This does not yet include the qemu and guestfs packages, which may be installed adhoc with sudo.)

Change 717732 abandoned by Krinkle:

[operations/puppet@production] ci: Fix profile::ci to be compatible with new empheral storage

Reason:

I've un-picked this from integration-puppet-master-02 in favour of now-merged https://gerrit.wikimedia.org/r/719376

https://gerrit.wikimedia.org/r/717732

Mentioned in SAL (#wikimedia-releng) [2021-09-17T18:08:02Z] <Krinkle> Re-recreating qemu-1002 as integration-agent-qemu-1003 (Debian 11 Bullseye, g3.cores8.ram24.disk20.ephemeral40.4xiops), ref T284774

OK. qemu-1003 is now up in the same shape as qemu-1001 and qemu-1002 were, although with a smaller ephemeral disk (40G instead of 60G) but we were only using 18G of it so that should be fine.