Page MenuHomePhabricator

Upgrade an-test-druid1001 to bullseye
Closed, ResolvedPublic

Description

This is part of the large ticket of upgrading the Druid clusters to Bullseye

This is the only Druid server that is a VM, so we cannot keep the existing data during a migration.

We therefore have to work out what is best:

  1. backup and restore
  2. reload data

We will also need to update the druid loading jobs in the test cluster so that they use the new hostname.

Event Timeline

Change 902092 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):

[operations/debs/druid@debian] * Rebuild for bullseye T332584 T332589 * Move to Java 11 * Remove adduser dependency for anything but druid-common, the rest don't need it * Remove versioned druid-common dependency, we're way past 0.10 for a while * Move to debhelper 13 (which absorbed dh-systemd)

https://gerrit.wikimedia.org/r/902092

Change 902092 abandoned by Muehlenhoff:

[operations/debs/druid@debian] * Rebuild for bullseye T332584 T332589 * Move to Java 11 * Remove adduser dependency for anything but druid-common, the rest don't need it * Remove versioned druid-common dependency, we're way past 0.10 for a while * Move to debhelper 13 (which absorbed dh-systemd)

Reason:

Obsolete, different patch was merged

https://gerrit.wikimedia.org/r/902092

Mentioned in SAL (#wikimedia-operations) [2023-03-22T15:53:36Z] <moritzm> uploaded druid 0.19.wmf0-2 to bullseye-wikimedia T332584 T332589

Mentioned in SAL (#wikimedia-operations) [2023-03-23T09:47:17Z] <moritzm> uploaded prometheus-druid-exporter 0.8-2 for bullseye-wikimedia T332584 T332589

Cookbook cookbooks.sre.ganeti.reimage was started by btullis@cumin1001 for host an-test-druid1001.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.ganeti.reimage started by btullis@cumin1001 for host an-test-druid1001.eqiad.wmnet with OS bullseye completed:

  • an-test-druid1001 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via gnt-instance
    • Host up (Debian installer)
    • Set boot to disk
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/ganeti/reimage/202303231136_btullis_3072350_an-test-druid1001.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
BTullis claimed this task.