Page MenuHomePhabricator

Cloud VPS pre-release Debian Bullseye images
Closed, ResolvedPublic

Description

As per title, tracking task to consider providing Bullseye images ahead of official release. TTBOMK the base puppetization should work out of the box on Bullseye already (T275873)

UPDATE: To add project access:

openstack image add project f99ec8f1-f7e7-4a8d-95c8-45f49a5fea37 
 OS_PROJECT_ID=mediawiki-vagrant openstack image set --accept f99ec8f1-f7e7-4a8d-95c8-45f49a5fea37

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
aborrero triaged this task as Low priority.
aborrero moved this task from Inbox to Soon! on the cloud-services-team (Kanban) board.

Whoever/whatever populates https://cdimage.debian.org/cdimage/openstack/ is not building Bullseye images; the 'testing' directory only contains stale Buster builds.

I'm hoping not to have to dive back into bootstrap-vz; with luck I'll figure out who is building those things.

^ is resolved, now I'm blocked by the fact that puppet won't run on Bullseye at all.

https://gerrit.wikimedia.org/r/c/operations/puppet/+/677496 is the immediate issue but presumably that was meant as a guard against other craziness.

Moritz thinks that this will be fixed in the daily builds at the end of this week.

Change 682751 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):

[operations/puppet@production] wmcs-image-create: Add half-hearted support for new daily Bullseye builds

https://gerrit.wikimedia.org/r/682751

Change 682751 merged by Andrew Bogott:

[operations/puppet@production] wmcs-image-create: Add half-hearted support for new daily Bullseye builds

https://gerrit.wikimedia.org/r/682751

Today marks two weeks of "Bullseye will get a version number tomorrow".

We still can't build or puppetize any of this automatically because

Could not retrieve catalog from remote server: Error 500 on SERVER: Server Error: Evaluation Error: Error while evaluating a Function Call, unsupported: facts['os']['release']['major'] (bullseye/sid) is not a number (file: /etc/puppet/modules/debian/manifests/init.pp, line: 22, column: 9)

Thank you for following up @Andrew, I'm wondering if we could locally hack sth to unblock that specific bit and see what else needs fixing?

Base-files 11.1 is now finally in Bullseye, so that blocker is hopefully gone.

Change 691741 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):

[operations/puppet@production] Add an OpenStack package class for Bullseye VMs

https://gerrit.wikimedia.org/r/691741

Change 691741 merged by Andrew Bogott:

[operations/puppet@production] Add an OpenStack package class for Bullseye VMs

https://gerrit.wikimedia.org/r/691741

Change 691744 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):

[operations/puppet@production] cloud-vps VMs: use ssd for Bullseye

https://gerrit.wikimedia.org/r/691744

Change 691744 merged by Andrew Bogott:

[operations/puppet@production] cloud-vps VMs: use ssd for Bullseye

https://gerrit.wikimedia.org/r/691744

Change 691746 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):

[operations/puppet@production] cloud-vps openstack packages: don't install python2 packages on bullseye

https://gerrit.wikimedia.org/r/691746

Change 691746 merged by Andrew Bogott:

[operations/puppet@production] cloud-vps openstack packages: don't install python2 packages on bullseye

https://gerrit.wikimedia.org/r/691746

Change 691955 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):

[operations/puppet@production] Openstack VM client packages: don't install python-netaddr on Bullseye

https://gerrit.wikimedia.org/r/691955

Change 691958 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):

[operations/puppet@production] cloud-vps: Don't install Diamond on Bullseye

https://gerrit.wikimedia.org/r/691958

Change 691956 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):

[operations/puppet@production] ldap::client::utils: Exclude python2 libraries from Bullseye

https://gerrit.wikimedia.org/r/691956

Change 691959 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):

[operations/puppet@production] cloud-vps: don't install ldapsupportlib on Bullseye

https://gerrit.wikimedia.org/r/691959

Change 691960 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):

[operations/puppet@production] Install systemd-timesyncd on Bullseye and later

https://gerrit.wikimedia.org/r/691960

Change 691955 merged by Andrew Bogott:

[operations/puppet@production] Openstack VM client packages: don't install python-netaddr on Bullseye

https://gerrit.wikimedia.org/r/691955

Puppet now get far enough on Bullseye to permit ssh access. The main remaining issue is this one:

Error: Systemd start for systemd-timesyncd failed!
journalctl log for systemd-timesyncd:
-- Journal begins at Sun 2021-05-16 19:13:17 UTC, ends at Sun 2021-05-16 19:44:04 UTC. --
-- No entries --

Error: /Stage[main]/Standard::Ntp::Timesyncd/Service[systemd-timesyncd]/ensure: change from 'stopped' to 'running' failed: Systemd start for systemd-timesyncd failed!
journalctl log for systemd-timesyncd:
-- Journal begins at Sun 2021-05-16 19:13:17 UTC, ends at Sun 2021-05-16 19:44:04 UTC. --
-- No entries --

Installing systemd-timesyncd seems to help if I do it by hand but if puppet installs it there's some sort of race with the service user UID; I don't know why it doesn't just work as it does on other distros.

Change 691958 merged by Andrew Bogott:

[operations/puppet@production] cloud-vps: Don't install Diamond on Bullseye

https://gerrit.wikimedia.org/r/691958

Change 691956 merged by Andrew Bogott:

[operations/puppet@production] ldap::client::utils: Exclude python2 libraries from Bullseye

https://gerrit.wikimedia.org/r/691956

Change 691959 merged by Andrew Bogott:

[operations/puppet@production] cloud-vps: don't install ldapsupportlib on Bullseye

https://gerrit.wikimedia.org/r/691959

@fgiunchedi let me know which projects you'd like to use this in; I'm going to only publish the new image on an as-needed basis to avoid rampant use of hard-to-support daily builds.

@fgiunchedi let me know which projects you'd like to use this in; I'm going to only publish the new image on an as-needed basis to avoid rampant use of hard-to-support daily builds.

Thank you for working on this! Please enable the image on these projects:

  • monitoring
  • swift
  • logging

I'm assuming (but don't need it for o11y purposes) that also sre-sandbox would be a good candidate

Can you add it to toolsbeta too? Given the current migration off Stretch I'd like to test if we can skip Buster with some misc services such as Redis.

  • monitoring
  • swift
  • logging
  • sre-sandbox
  • toolsbeta

Change 692852 had a related patch set uploaded (by Jbond; author: John Bond):

[operations/puppet@production] O:standard::manifest::ntp::timesync install systemd-timesyncd on bullseye

https://gerrit.wikimedia.org/r/692852

@Andrew I took a quick look at this and created a bullseye image which has an intentionally borked (pdev-borked.puppet-dev.eqiad1.wikimedia.cloud) puppet config so puppet doesn't preform the first run. From this i can see that

  • chrony is installed
  • there is a systemd-timesync user with a uid (999) outside of the LAST_SYSTEM_UID (499)
$ apt-cache policy chrony
chrony:
  Installed: 4.0-7
  Candidate: 4.0-7
  Version table:
 *** 4.0-7 500
        500 http://deb.debian.org/debian bullseye/main amd64 Packages
        100 /var/lib/dpkg/status
$ id systemd-timesync
uid=999(systemd-timesync) gid=999(systemd-timesync) groups=999(systemd-timesync)
$ grep SYSTEM_UID /etc/adduser.conf
FIRST_SYSTEM_UID=100
LAST_SYSTEM_UID=499

Both of theses issues should be cleaned up from the base image e.g.

$ sudo userdel systemd-timesync 
$ sudo apt-get install !$
sudo apt-get install systemd-timesync
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
E: Unable to locate package systemd-timesync
root@pdev-borked:~# sudo apt-get install systemd-timesyncd
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following package was automatically installed and is no longer required:
  python3-debconf
Use 'sudo apt autoremove' to remove it.
The following packages will be REMOVED:
  chrony
The following NEW packages will be installed:
  systemd-timesyncd
0 upgraded, 1 newly installed, 1 to remove and 0 not upgraded.
Need to get 130 kB of archives.
After this operation, 433 kB disk space will be freed.
Do you want to continue? [Y/n] Y
Get:1 http://deb.debian.org/debian bullseye/main amd64 systemd-timesyncd amd64 247.3-5 [130 kB]
Fetched 130 kB in 0s (3849 kB/s)      
dpkg: chrony: dependency problems, but removing anyway as you requested:
 systemd depends on systemd-timesyncd | time-daemon; however:
  Package systemd-timesyncd is not installed.
  Package time-daemon is not installed.
  Package chrony which provides time-daemon is to be removed.

(Reading database ... 51386 files and directories currently installed.)
Removing chrony (4.0-7) ...
Selecting previously unselected package systemd-timesyncd.
(Reading database ... 51350 files and directories currently installed.)
Preparing to unpack .../systemd-timesyncd_247.3-5_amd64.deb ...
Unpacking systemd-timesyncd (247.3-5) ...
Setting up systemd-timesyncd (247.3-5) ...

Configuration file '/etc/systemd/timesyncd.conf'
 ==> File on system created by you or by a script.
 ==> File also in package provided by package maintainer.
 ==> Keeping old config file as default.
Created symlink /etc/systemd/system/dbus-org.freedesktop.timesync1.service → /lib/systemd/system/systemd-timesyncd.service.
Created symlink /etc/systemd/system/sysinit.target.wants/systemd-timesyncd.service → /lib/systemd/system/systemd-timesyncd.service.
Processing triggers for dbus (1.12.20-2) ...
Processing triggers for man-db (2.9.4-2) ...

Once this is applied puppet runs without issue

Both of theses issues should be cleaned up from the base image e.g.

We could also just remove the systemd-timesyncd user then installing systemd-timesyncd via puppet should work

we should explicitly remove chrony as i have now dropped chrony support

https://gerrit.wikimedia.org/r/c/operations/puppet/+/692877

Change 692866 had a related patch set uploaded (by Filippo Giunchedi; author: Filippo Giunchedi):

[operations/puppet@production] openstack: add bullseye VM clientpackages

https://gerrit.wikimedia.org/r/692866

Change 692852 abandoned by Jbond:

[operations/puppet@production] O:standard::manifest::ntp::timesync install systemd-timesyncd on bullseye

Reason:

See discussion

https://gerrit.wikimedia.org/r/692852

  • there is a systemd-timesync user with a uid (999) outside of the LAST_SYSTEM_UID (499)

This sounds like an ordering issue in the image build, the system user gets created in the postinst with adduser, but at this point our puppetised /etc/adduser.conf (which limits the range for WMF-managed system users to 100-499) doesn't seem to be in place yet. Probably because the listed packages get installed before Puppet runs the first time or so? It works fine in production installs; on new bullseye installations systemd-timesync has the 101:101 system user.

Change 692877 had a related patch set uploaded (by Jbond; author: John Bond):

[operations/puppet@production] chrony: drop chrony

https://gerrit.wikimedia.org/r/692877

Change 692877 merged by Jbond:

[operations/puppet@production] chrony: drop chrony

https://gerrit.wikimedia.org/r/692877

Change 692866 abandoned by Filippo Giunchedi:

[operations/puppet@production] openstack: add bullseye VM clientpackages

Reason:

Per discussion

https://gerrit.wikimedia.org/r/692866

The bullseye build process is basically just puppet -- I start with a cloud image from https://cdimage.debian.org/cdimage/cloud/bullseye/daily/, run puppet until it stabilizes, and then snapshot and use that as the future base image.

So if there's something out of order it's likely either in puppet or the issue is already present in the upstream cloud image.

the issue is already present in the upstream cloud image.

looks like this is in the base image.

Change 693152 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):

[operations/puppet@production] Nova vendordata.txt: fix up new VMs that have chrony installed

https://gerrit.wikimedia.org/r/693152

Change 693152 merged by Andrew Bogott:

[operations/puppet@production] Nova vendordata.txt: fix up new VMs that have chrony installed

https://gerrit.wikimedia.org/r/693152

Change 691960 abandoned by Andrew Bogott:

[operations/puppet@production] Install systemd-timesyncd on Bullseye and later

Reason:

handled instead at VM creation time with https://gerrit.wikimedia.org/r/c/operations/puppet/ /693152

https://gerrit.wikimedia.org/r/691960

Change 693167 had a related patch set uploaded (by Jbond; author: John Bond):

[operations/puppet@production] Nova vendordata.txt: delete systemd-coredump user

https://gerrit.wikimedia.org/r/693167