Page MenuHomePhabricator

Migrate hue.wikimedia.org to bullseye
Closed, DeclinedPublic

Description

We currently run hue.wikimedia.org on a single virtual machine.

  • an-tool1008.eqiad.wmnet

This machine needs to be upgraded to Debian bullseye

We have previously upgraded an-test-ui1001.eqiad.wmnet to bullseye. This machine also runs Hue, but it might be useful to verify that this works before upgrading the production instance.

Event Timeline

BTullis triaged this task as High priority.Nov 15 2023, 9:45 AM

Mentioned in SAL (#wikimedia-analytics) [2024-01-29T13:06:31Z] <brouberol> I'm starting the reimaging process of an-tool1009.eqiad.wmnet, which will cause unavalability of hue.wikimedia.org while it runs - T349400

Cookbook cookbooks.sre.hosts.reimage was started by brouberol@cumin1002 for host an-tool1009.eqiad.wmnet with OS bullseye

Change 993692 had a related patch set uploaded (by Brouberol; author: Brouberol):

[operations/puppet@production] hue: rename python-snappy apt dependency

https://gerrit.wikimedia.org/r/993692

Change 993692 merged by Brouberol:

[operations/puppet@production] hue: rename python-snappy apt dependency

https://gerrit.wikimedia.org/r/993692

Cookbook cookbooks.sre.hosts.reimage started by brouberol@cumin1002 for host an-tool1009.eqiad.wmnet with OS bullseye executed with errors:

  • an-tool1009 (FAIL)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via gnt-instance
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Set boot media to disk
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202401291323_brouberol_759805_an-tool1009.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • The reimage failed, see the cookbook logs for the details

We found out that the hue package hadn't been built for bullseye. We're going to revert an-tool1009 to Buster until we can build a hue package for bullseye.

Cookbook cookbooks.sre.hosts.reimage was started by brouberol@cumin1002 for host an-tool1009.eqiad.wmnet with OS buster

Change 993708 had a related patch set uploaded (by Brouberol; author: Brouberol):

[operations/debs/hue@master] Build hue for Debian Bullseye by default

https://gerrit.wikimedia.org/r/993708

Cookbook cookbooks.sre.hosts.reimage started by brouberol@cumin1002 for host an-tool1009.eqiad.wmnet with OS buster completed:

  • an-tool1009 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via gnt-instance
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Set boot media to disk
    • Host up (new fresh buster OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202401291534_brouberol_780742_an-tool1009.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage was started by brouberol@cumin1002 for host an-tool1008.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by brouberol@cumin1002 for host an-tool1008.eqiad.wmnet with OS bullseye completed:

  • an-tool1008 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via gnt-instance
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Set boot media to disk
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202401300914_brouberol_935671_an-tool1008.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Hue has proven to be quite tricky to build for bullseye. @MoritzMuehlenhoff suggested that as https://phabricator.wikimedia.org/T340144 could deprecate the need for Hue altogether, there might be a case where Hue is simpler to deprecate than OS-upgrade. We'll revisit once we know more.

Change 993708 abandoned by Brouberol:

[operations/debs/hue@master] Build hue for Debian Bullseye by default

Reason:

Now that we can upgrade superset to 3.x, Hue might be on the way out anyway.

https://gerrit.wikimedia.org/r/993708

Gehel added a subscriber: brouberol.
Gehel subscribed.

Moving back to our backlog until we know if we can deprecated hue or not.

Deprecating Hue is unblocked, so let's do that instead of upgrading. See T341895.