Page MenuHomePhabricator

Migrate bastions to Bookworm
Closed, ResolvedPublic

Description

Migrate the SSH bastions to Bookworm:

  • eqiad
  • codfw
  • esams
  • ulsfo
  • eqsin
  • drmrs

Event Timeline

Cookbook cookbooks.sre.hosts.reimage was started by jmm@cumin2002 for host bast3007.wikimedia.org with OS bookworm

Change 945767 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):

[operations/puppet@production] Add new bastions to site.pp

https://gerrit.wikimedia.org/r/945767

Change 945767 merged by Muehlenhoff:

[operations/puppet@production] Add new bastions to site.pp

https://gerrit.wikimedia.org/r/945767

Cookbook cookbooks.sre.hosts.reimage started by jmm@cumin2002 for host bast3007.wikimedia.org with OS bookworm completed:

  • bast3007 (WARN)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via gnt-instance
    • Host up (Debian installer)
    • Set boot media to disk
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run failed and logged in /var/log/spicerack/sre/hosts/reimage/202308041151_jmm_2421825_bast3007.out, asking the operator what to do
    • First Puppet run failed and logged in /var/log/spicerack/sre/hosts/reimage/202308041156_jmm_2421825_bast3007.out, asking the operator what to do
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202308041158_jmm_2421825_bast3007.out
    • Unable to run puppet on config-master2001.codfw.wmnet,config-master1001.eqiad.wmnet,puppetmaster2001.codfw.wmnet,puppetmaster1001.eqiad.wmnet to update configmaster.wikimedia.org with the new host SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage was started by jmm@cumin2002 for host bast4005.wikimedia.org with OS bookworm

Cookbook cookbooks.sre.hosts.reimage started by jmm@cumin2002 for host bast4005.wikimedia.org with OS bookworm completed:

  • bast4005 (PASS)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via gnt-instance
    • Host up (Debian installer)
    • Set boot media to disk
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202308041402_jmm_2577329_bast4005.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage was started by jmm@cumin2002 for host bast5004.wikimedia.org with OS bookworm

Cookbook cookbooks.sre.hosts.reimage started by jmm@cumin2002 for host bast5004.wikimedia.org with OS bookworm executed with errors:

  • bast5004 (FAIL)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via gnt-instance
    • Host up (Debian installer)
    • Set boot media to disk
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • The reimage failed, see the cookbook logs for the details

cookbooks.sre.hosts.decommission executed by jmm@cumin2002 for hosts: bast5004.wikimedia.org

  • bast5004.wikimedia.org (WARN)
    • Host not found on Icinga, unable to downtime it
    • Found Ganeti VM
    • VM shutdown
    • Started forced sync of VMs in Ganeti cluster eqsin to Netbox
    • Removed from DebMonitor
    • Removed from Puppet master and PuppetDB
    • VM removed
    • Started forced sync of VMs in Ganeti cluster eqsin to Netbox

cookbooks.sre.hosts.decommission executed by jmm@cumin2002 for hosts: bast3007.wikimedia.org

  • bast3007.wikimedia.org (PASS)
    • Downtimed host on Icinga/Alertmanager
    • Found Ganeti VM
    • VM shutdown
    • Started forced sync of VMs in Ganeti cluster esams to Netbox
    • Removed from DebMonitor
    • Removed from Puppet master and PuppetDB
    • VM removed
    • Started forced sync of VMs in Ganeti cluster esams to Netbox

Cookbook cookbooks.sre.hosts.reimage was started by jmm@cumin2002 for host bast5004.wikimedia.org with OS bookworm

Cookbook cookbooks.sre.hosts.reimage started by jmm@cumin2002 for host bast5004.wikimedia.org with OS bookworm executed with errors:

  • bast5004 (FAIL)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via gnt-instance
    • Host up (Debian installer)
    • Set boot media to disk
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • The reimage failed, see the cookbook logs for the details

Cookbook cookbooks.sre.hosts.reimage was started by jmm@cumin2002 for host bast6003.wikimedia.org with OS bookworm

Cookbook cookbooks.sre.hosts.reimage started by jmm@cumin2002 for host bast6003.wikimedia.org with OS bookworm executed with errors:

  • bast6003 (FAIL)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via gnt-instance
    • Host up (Debian installer)
    • Set boot media to disk
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • The reimage failed, see the cookbook logs for the details

Change 952320 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):

[operations/puppet@production] new bastions in ulsfo/eqsin/drmrs

https://gerrit.wikimedia.org/r/952320

Change 952320 merged by Muehlenhoff:

[operations/puppet@production] new bastions in ulsfo/eqsin/drmrs

https://gerrit.wikimedia.org/r/952320

cookbooks.sre.hosts.decommission executed by jmm@cumin2002 for hosts: bast4004.wikimedia.org

  • bast4004.wikimedia.org (PASS)
    • Downtimed host on Icinga/Alertmanager
    • Found Ganeti VM
    • VM shutdown
    • Started forced sync of VMs in Ganeti cluster ulsfo to Netbox
    • Removed from DebMonitor
    • Removed from Puppet master and PuppetDB
    • VM removed
    • Started forced sync of VMs in Ganeti cluster ulsfo to Netbox

cookbooks.sre.hosts.decommission executed by jmm@cumin2002 for hosts: bast5003.wikimedia.org

  • bast5003.wikimedia.org (PASS)
    • Downtimed host on Icinga/Alertmanager
    • Found Ganeti VM
    • VM shutdown
    • Started forced sync of VMs in Ganeti cluster eqsin to Netbox
    • Removed from DebMonitor
    • Removed from Puppet master and PuppetDB
    • VM removed
    • Started forced sync of VMs in Ganeti cluster eqsin to Netbox

cookbooks.sre.hosts.decommission executed by jmm@cumin2002 for hosts: bast6002.wikimedia.org

  • bast6002.wikimedia.org (PASS)
    • Downtimed host on Icinga/Alertmanager
    • Found Ganeti VM
    • VM shutdown
    • Started forced sync of VMs in Ganeti cluster drmrs02 to Netbox
    • Removed from DebMonitor
    • Removed from Puppet master and PuppetDB
    • VM removed
    • Started forced sync of VMs in Ganeti cluster drmrs02 to Netbox

Change 954597 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):

[operations/puppet@production] Remove bast4004/bast5003/bast6002

https://gerrit.wikimedia.org/r/954597

Change 954597 merged by Muehlenhoff:

[operations/puppet@production] Remove bast4004/bast5003/bast6002

https://gerrit.wikimedia.org/r/954597

MoritzMuehlenhoff updated the task description. (Show Details)

All bastions are on bookworm now