Page MenuHomePhabricator

Migrate apifeatureusage hosts to Bullseye or later
Closed, ResolvedPublic

Description

The following hosts are still on Buster and should be upgraded to Bullseye or later:

apifeatureusage[12]001

Creating this ticket to:

  • Determine which team will perform this work
  • Perform this work

Event Timeline

Gehel triaged this task as High priority.Nov 15 2023, 9:46 AM

Handover of apifeature usage isn't happening at the moment, Data-Platform-SRE will take care of this upgrade

We see to have an elastic repository for Bookworm, so I'm going to attempt a bookworm reimage, as we only run logstash on these hosts.

brouberol@apt1001:~$ ls /srv/wikimedia/dists/*-wikimedia/thirdparty/elastic710
/srv/wikimedia/dists/bookworm-wikimedia/thirdparty/elastic710:
binary-amd64  binary-i386  source

/srv/wikimedia/dists/bullseye-wikimedia/thirdparty/elastic710:
binary-amd64  binary-i386  source

/srv/wikimedia/dists/buster-wikimedia/thirdparty/elastic710:
binary-amd64  binary-i386  source

Mentioned in SAL (#wikimedia-analytics) [2024-02-13T09:03:51Z] <brouberol> attempting a reimage of apifeatureusage1001 to bookworm - T346053

Cookbook cookbooks.sre.hosts.reimage was started by brouberol@cumin1002 for host apifeatureusage1001.eqiad.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage started by brouberol@cumin1002 for host apifeatureusage1001.eqiad.wmnet with OS bookworm executed with errors:

  • apifeatureusage1001 (FAIL)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via gnt-instance
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Set boot media to disk
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run failed and logged in /var/log/spicerack/sre/hosts/reimage/202402130918_brouberol_3861011_apifeatureusage1001.out, asking the operator what to do
    • First Puppet run failed and the operator aborted
    • The reimage failed, see the cookbook logs for the details,You can also try typing "install-console" apifeatureusage1001.eqiad.wmnet to get a root shellbut depending on the failure this may not work.

Cookbook cookbooks.sre.hosts.reimage was started by brouberol@cumin1002 for host apifeatureusage1001.eqiad.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage started by brouberol@cumin1002 for host apifeatureusage1001.eqiad.wmnet with OS bookworm executed with errors:

  • apifeatureusage1001 (FAIL)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via gnt-instance
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Set boot media to disk
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run failed and logged in /var/log/spicerack/sre/hosts/reimage/202402131009_brouberol_3868391_apifeatureusage1001.out, asking the operator what to do
    • First Puppet run failed and the operator aborted
    • The reimage failed, see the cookbook logs for the details,You can also try typing "install-console" apifeatureusage1001.eqiad.wmnet to get a root shellbut depending on the failure this may not work.

Cookbook cookbooks.sre.hosts.reimage was started by brouberol@cumin1002 for host apifeatureusage1001.eqiad.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage started by brouberol@cumin1002 for host apifeatureusage1001.eqiad.wmnet with OS bookworm executed with errors:

  • apifeatureusage1001 (FAIL)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via gnt-instance
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Set boot media to disk
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run failed and logged in /var/log/spicerack/sre/hosts/reimage/202402131039_brouberol_3872720_apifeatureusage1001.out, asking the operator what to do
    • First Puppet run failed and the operator aborted
    • The reimage failed, see the cookbook logs for the details,You can also try typing "install-console" apifeatureusage1001.eqiad.wmnet to get a root shellbut depending on the failure this may not work.

Cookbook cookbooks.sre.hosts.reimage was started by brouberol@cumin1002 for host apifeatureusage1001.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by brouberol@cumin1002 for host apifeatureusage1001.eqiad.wmnet with OS bullseye completed:

  • apifeatureusage1001 (PASS)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via gnt-instance
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Set boot media to disk
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202402131104_brouberol_3876308_apifeatureusage1001.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage was started by brouberol@cumin1002 for host apifeatureusage2001.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by brouberol@cumin1002 for host apifeatureusage2001.codfw.wmnet with OS bullseye completed:

  • apifeatureusage2001 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via gnt-instance
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Set boot media to disk
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202402131134_brouberol_3882231_apifeatureusage2001.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB