Page MenuHomePhabricator

Upgrade schema hosts to bookworm
Closed, ResolvedPublic

Description

Reference ticket for when they were upgrade to buster: T255026: Upgrade schema[12]00[12] to Debian Buster

These are 4 VMs, 2 in eqiad and 2 in codfw

  • schema1003.eqiad.wmnet
  • schema1004.eqiad.wmnet
  • schema2003.codfw.wmnet
  • schema2004.codfw.wmnet

Since the last time these were upgraded, our reimaging cookbook has been adapted so that it works with VMs and this means that they can be upgraded in place.

This should make it even easier to upgrade them. There are no custom applications running on them, it is just nginx and a git::clone which is all managed by puppet.
Tagging Event-Platform and Data-Engineering for visibility, but I would expect that Data-Platform-SRE will be carrying out the upgrades.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Gehel triaged this task as High priority.Nov 15 2023, 9:44 AM

Mentioned in SAL (#wikimedia-analytics) [2023-11-29T14:04:40Z] <btullis> depooling schema1003 for reimage T349286

Mentioned in SAL (#wikimedia-analytics) [2023-11-29T14:10:29Z] <btullis> reimaging schema1003 to bookworm for T349286

Cookbook cookbooks.sre.hosts.reimage was started by btullis@cumin1001 for host schema1003.eqiad.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage started by btullis@cumin1001 for host schema1003.eqiad.wmnet with OS bookworm completed:

  • schema1003 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via gnt-instance
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Set boot media to disk
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202311291424_btullis_1645562_schema1003.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Mentioned in SAL (#wikimedia-analytics) [2023-11-29T14:43:02Z] <btullis> depooling schema1004 for reimage T349286

Cookbook cookbooks.sre.hosts.reimage was started by btullis@cumin1001 for host schema1004.eqiad.wmnet with OS bookworm

Mentioned in SAL (#wikimedia-analytics) [2023-11-29T14:44:09Z] <btullis> reimaging schema1004 to bookworm for T349286

Cookbook cookbooks.sre.hosts.reimage started by btullis@cumin1001 for host schema1004.eqiad.wmnet with OS bookworm completed:

  • schema1004 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via gnt-instance
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Set boot media to disk
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202311291502_btullis_1661207_schema1004.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Mentioned in SAL (#wikimedia-analytics) [2023-11-29T15:24:51Z] <btullis> pooled schema1004 after upgrade to bookworm for T349286

Cookbook cookbooks.sre.hosts.reimage was started by btullis@cumin1001 for host schema2003.codfw.wmnet with OS bookworm

BTullis renamed this task from Upgrade schema hosts to bullseye to Upgrade schema hosts to bookworm.Nov 29 2023, 3:31 PM

Cookbook cookbooks.sre.hosts.reimage started by btullis@cumin1001 for host schema2003.codfw.wmnet with OS bookworm completed:

  • schema2003 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via gnt-instance
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Set boot media to disk
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202311291603_btullis_1684882_schema2003.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Mentioned in SAL (#wikimedia-analytics) [2023-11-29T17:10:28Z] <btullis> depool schema2004 for reimage to bookworm for T349286

Cookbook cookbooks.sre.hosts.reimage was started by btullis@cumin1001 for host schema2004.codfw.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage started by btullis@cumin1001 for host schema2004.codfw.wmnet with OS bookworm completed:

  • schema2004 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via gnt-instance
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Set boot media to disk
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202311291732_btullis_1750826_schema2004.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
BTullis updated the task description. (Show Details)
BTullis moved this task from In Progress to Done on the Data-Platform-SRE board.