Page MenuHomePhabricator

Upgrade Collab hosts to Bookworm
Open, MediumPublic

Description

To be upgraded:

Upgraded:

  • etherpad2002
  • etherpad1004
  • gerrit2003
  • lists1004
  • lists2001
  • people2003
  • people1004
  • phab1005
  • planet2003
  • planet1003
  • stewards2001
  • stewards1001
  • vrts2002
  • vrts1003

Pending deprecation, won't be upgraded

Details

Related Changes in Gerrit:
Related Changes in GitLab:
TitleReferenceAuthorSource BranchDest Branch
Bump CI image to bookworm (followup)repos/releng/gitlab-trusted-runner!128dancymain-I8148ef92cba017f8657b83dd62042e5ad7c93281main
Bump CI image to bookwormrepos/releng/gitlab-trusted-runner!127dancymain-I921bcef9c2899e171874f63da63909aa84493821main
Bump CI image to bookwormrepos/releng/gitlab-trusted-runner!126dancymain-I921bcef9c2899e171874f63da63909aa84493821main
Draft: bump CI image to bookwormrepos/releng/gitlab-trusted-runner!125jeltoci-bookwormmain
Customize query in GitLab

Related Objects

StatusSubtypeAssignedTask
OpenNone
OpenNone
OpenNone
ResolvedNone
OpenNone
OpenNone
OpenNone
OpenNone
ResolvedDzahn
Resolvedjcrespo
DuplicateMarostegui
ResolvedDzahn
ResolvedMarostegui
ResolvedDzahn
ResolvedDzahn
ResolvedDzahn
ResolvedArnoldokoth
ResolvedDzahn
ResolvedArnoldokoth
ResolvedArnoldokoth
OpenNone
In ProgressABran-WMF
OpenNone
ResolvedABran-WMF
ResolvedABran-WMF
ResolvedMatthewVernon
ResolvedLSobanski
ResolvedABran-WMF
OpenABran-WMF
ResolvedLSobanski
Resolvedhashar
ResolvedABran-WMF
Resolvedhashar
OpenABran-WMF
ResolvedDzahn
In ProgressABran-WMF
ResolvedDzahn
OpenNone
ResolvedMarostegui
ResolvedDzahn
OpenNone
ResolvedJelto
ResolvedJelto
OpenNone
ResolvedABran-WMF
ResolvedJelto
ResolvedNone
ResolvedJelto
ResolvedJelto
ResolvedJelto
ResolvedABran-WMF
ResolvedBUG REPORTJelto
ResolvedDzahn
ResolvedABran-WMF
ResolvedABran-WMF
ResolvedABran-WMF
ResolvedDzahn
ResolvedJelto

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Change #1136765 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] jenkins: ensure systemd service dir exists before override

https://gerrit.wikimedia.org/r/1136765

Change #1135994 abandoned by Dzahn:

[operations/puppet@production] jenkins: fix puppet error, systemd override requires systemd service

Reason:

continue at https://gerrit.wikimedia.org/r/c/operations/puppet/+/1136765

https://gerrit.wikimedia.org/r/1135994

Change #1136765 merged by Dzahn:

[operations/puppet@production] jenkins: ensure systemd service dir exists before override

https://gerrit.wikimedia.org/r/1136765

Icinga downtime and Alertmanager silence (ID=fd8d0c80-cd7d-469a-8105-f184bc41f4f2) set by aokoth@cumin1002 for 2 days, 0:00:00 on 1 host(s) and their services with reason: Bookworm Re-image

aphlict2001.codfw.wmnet

Cookbook cookbooks.sre.hosts.reimage was started by aokoth@cumin1002 for host aphlict2001.codfw.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage started by aokoth@cumin1002 for host aphlict2001.codfw.wmnet with OS bookworm completed:

  • aphlict2001 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via gnt-instance
    • Host up (Debian installer)
    • Add puppet_version metadata (7) to Debian installer
    • Set boot media to disk
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202504161211_aokoth_152128_aphlict2001.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

miscweb2003/miscweb1003 can probably be removed from the list since we are not planning to keep running them (soonish).

LSobanski raised the priority of this task from Low to Medium.Apr 17 2025, 10:47 AM

Icinga downtime and Alertmanager silence (ID=89d0151d-5211-42af-a90d-110fc29b52da) set by aokoth@cumin1002 for 2 days, 0:00:00 on 1 host(s) and their services with reason: Bookworm Re-image

aphlict2001.codfw.wmnet

Icinga downtime and Alertmanager silence (ID=af8c7478-bbf4-41a4-acbf-2af5fb14ee0e) set by aokoth@cumin1002 for 1 day, 0:00:00 on 1 host(s) and their services with reason: Bookworm Reboot

aphlict2001.codfw.wmnet

Cookbook cookbooks.sre.hosts.reimage was started by aokoth@cumin1002 for host doc1004.eqiad.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage started by aokoth@cumin1002 for host doc1004.eqiad.wmnet with OS bookworm completed:

  • doc1004 (PASS)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via gnt-instance
    • Host up (Debian installer)
    • Add puppet_version metadata (7) to Debian installer
    • Set boot media to disk
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202505201905_aokoth_3979640_doc1004.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage was started by dzahn@cumin1002 for host zuul2002.codfw.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage started by dzahn@cumin1002 for host zuul2002.codfw.wmnet with OS bookworm completed:

  • zuul2002 (PASS)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via gnt-instance
    • Host up (Debian installer)
    • Add puppet_version metadata (7) to Debian installer
    • Set boot media to disk
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202505202021_dzahn_3988586_zuul2002.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage was started by dzahn@cumin1002 for host zuul2003.codfw.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage started by dzahn@cumin1002 for host zuul2003.codfw.wmnet with OS bookworm completed:

  • zuul2003 (PASS)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via gnt-instance
    • Host up (Debian installer)
    • Add puppet_version metadata (7) to Debian installer
    • Set boot media to disk
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202505202141_dzahn_4003773_zuul2003.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage was started by jelto@cumin1002 for host gitlab-runner1002.eqiad.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage started by jelto@cumin1002 for host gitlab-runner1002.eqiad.wmnet with OS bookworm completed:

  • gitlab-runner1002 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata (7) to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202506030925_jelto_3346180_gitlab-runner1002.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

I reimaged gitlab-runner1002 to bookworm. So far the runner looks good and healthy. But I'll monitor the jobs and metrics.

I'll also replace the devtools test-runners with bookworm VMs and one of the VMs in gitlab-runner WMCS project (runner-1021).

Cookbook cookbooks.sre.hosts.reimage was started by jelto@cumin1002 for host gitlab-runner1003.eqiad.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage started by jelto@cumin1002 for host gitlab-runner1003.eqiad.wmnet with OS bookworm completed:

  • gitlab-runner1003 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata (7) to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202506160904_jelto_2258786_gitlab-runner1003.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage was started by jelto@cumin1002 for host gitlab-runner1004.eqiad.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage started by jelto@cumin1002 for host gitlab-runner1004.eqiad.wmnet with OS bookworm completed:

  • gitlab-runner1004 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata (7) to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202506161157_jelto_2430088_gitlab-runner1004.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage was started by jelto@cumin1002 for host gitlab-runner2002.codfw.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage started by jelto@cumin1002 for host gitlab-runner2002.codfw.wmnet with OS bookworm completed:

  • gitlab-runner2002 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata (7) to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202506161345_jelto_2542161_gitlab-runner2002.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage was started by jelto@cumin1002 for host gitlab-runner2003.codfw.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage started by jelto@cumin1002 for host gitlab-runner2003.codfw.wmnet with OS bookworm completed:

  • gitlab-runner2003 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata (7) to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202506170748_jelto_3201448_gitlab-runner2003.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage was started by jelto@cumin1002 for host gitlab-runner2004.codfw.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage started by jelto@cumin1002 for host gitlab-runner2004.codfw.wmnet with OS bookworm completed:

  • gitlab-runner2004 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata (7) to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202506170833_jelto_3209289_gitlab-runner2004.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

All Trusted GitLab runners were upgraded to bookworm.

Change #1160119 had a related patch set uploaded (by Jelto; author: Jelto):

[operations/puppet@production] gitlab-runner: upgrade default image to bookworm

https://gerrit.wikimedia.org/r/1160119

Change #1160120 had a related patch set uploaded (by Jelto; author: Jelto):

[operations/puppet@production] gitlab-runner: upgrade default image to bookworm on Trusted Runners

https://gerrit.wikimedia.org/r/1160120

doc2002 and doc1003 are already replaced with doc2003 and doc1004 by @Arnoldokoth

So they just need decom at this point.

Change #1160119 merged by Jelto:

[operations/puppet@production] gitlab-runner: upgrade default image to bookworm

https://gerrit.wikimedia.org/r/1160119

In T377889 we now managed to deploy phabricator with scap now..and installed PHP 8.

This let's us confirm if Phabricator works on bookworm and with PHP8.. which should unblock the upgrades of the phab hosts here.

Change #1160120 merged by Jelto:

[operations/puppet@production] gitlab-runner: upgrade default image to bookworm on Trusted Runners

https://gerrit.wikimedia.org/r/1160120

Change #1164222 had a related patch set uploaded (by Jelto; author: Jelto):

[operations/puppet@production] devtools: update hiera config for new bookworm hosts

https://gerrit.wikimedia.org/r/1164222

Change #1164222 merged by Jelto:

[operations/puppet@production] devtools: update hiera config for new bookworm hosts

https://gerrit.wikimedia.org/r/1164222

All devtools runners were upgraded to bookworm. I also updated one of the WMCS shared runners runner-1031.gitlab-runners.eqiad1.wikimedia.cloud to bookwom. I'll covert the others next week. Meanwhile I also updated the documentation for adding new WMCS runners.

The new WMCS runners also have a new flavor with a second 90GB disk, which should give /var/lib/docker a bit more space.

Dzahn updated the task description. (Show Details)

doc hosts checked off the list, linked to resolved ticket for them

Change #1167871 had a related patch set uploaded (by Jelto; author: Jelto):

[operations/puppet@production] aptrepo: add gitlab package for bookworm

https://gerrit.wikimedia.org/r/1167871

Change #1167871 merged by Dzahn:

[operations/puppet@production] aptrepo: add gitlab package for bookworm

https://gerrit.wikimedia.org/r/1167871

gitlab-ce is available for bookworm now:

jelto@apt1002:~$ sudo -i reprepro lsbycomponent gitlab-ce
gitlab-ce | 18.0.4-ce.0 | bullseye-wikimedia | thirdparty/gitlab-bullseye | amd64
gitlab-ce | 18.0.4-ce.0 | bookworm-wikimedia | thirdparty/gitlab-bookworm | amd64

For the next steps of the upgrade I'll add a subtask (the upgrade also involves a switchover between the GitLab hosts).

Dzahn updated the task description. (Show Details)