Page MenuHomePhabricator

Migrate eqiad Ganeti cluster to Buster
Closed, ResolvedPublic

Description

  • Add component/ganeti216 to all nodes
  • Upgrade all nodes using "cumin A:ganeti-eqiad 'apt-get install -y ganeti'"
  • cumin A:ganeti-eqiad 'chown gnt-metad /var/log/ganeti/meta-daemon.log'
  • sudo gnt-cluster renew-crypto --new-cluster-certificate --new-rapi-certificate --new-spice-certificate
  • sudo gnt-cluster upgrade --to 2.16
  • sudo gnt-cluster renew-crypto --new-node-certificates (might not be needed for future updates if the hashing algo stays the same)

Following that empty/reimage/readd the following virtualisation servers:

  • ganeti1005.eqiad.wmnet
  • ganeti1006.eqiad.wmnet
  • ganeti1007.eqiad.wmnet
  • ganeti1008.eqiad.wmnet
  • ganeti1009.eqiad.wmnet
  • ganeti1010.eqiad.wmnet
  • ganeti1011.eqiad.wmnet
  • ganeti1012.eqiad.wmnet
  • ganeti1013.eqiad.wmnet
  • ganeti1014.eqiad.wmnet
  • ganeti1015.eqiad.wmnet
  • ganeti1016.eqiad.wmnet
  • ganeti1017.eqiad.wmnet
  • ganeti1018.eqiad.wmnet
  • ganeti1019.eqiad.wmnet
  • ganeti1020.eqiad.wmnet
  • ganeti1021.eqiad.wmnet
  • ganeti1022.eqiad.wmnet

The following instances were temporarily converted from plain disk storage to DRBD:

  • kubestagetcd1004 (converted back to plain)
  • kubestagetcd1005 (converted back to plain)
  • kubestagetcd1006 (converted back to plain)
  • kubetcd1004 (converted back to plain)
  • kubetcd1005 (converted back to plain)
  • kubetcd1006 (converted back to plain)
  • ml-etcd1001
  • ml-etcd1002
  • ml-etcd1003

Event Timeline

Change 742730 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):

[operations/dns@master] Point back irc.wikimedia.org to irc2001

https://gerrit.wikimedia.org/r/742730

Change 742730 merged by Muehlenhoff:

[operations/dns@master] Point back irc.wikimedia.org to irc2001

https://gerrit.wikimedia.org/r/742730

herron triaged this task as Medium priority.Dec 3 2021, 7:08 PM

Change 754540 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):

[operations/puppet@production] Enable ganeti 2.16 in eqiad

https://gerrit.wikimedia.org/r/754540

Change 754540 merged by Muehlenhoff:

[operations/puppet@production] Enable ganeti 2.16 in eqiad

https://gerrit.wikimedia.org/r/754540

Mentioned in SAL (#wikimedia-operations) [2022-01-18T09:50:01Z] <moritzm> installing ganeti 2.16.0-1~bpo9+1+wmf1 on ganeti/eqiad servers T296721

Cookbook cookbooks.sre.hosts.reimage was started by jmm@cumin2002 for host ganeti1018.eqiad.wmnet with OS buster

Cookbook cookbooks.sre.hosts.reimage started by jmm@cumin2002 for host ganeti1018.eqiad.wmnet with OS buster executed with errors:

  • ganeti1018 (FAIL)
    • Downtimed on Icinga
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • The reimage failed, see the cookbook logs for the details

Cookbook cookbooks.sre.hosts.reimage was started by jmm@cumin2002 for host ganeti1018.eqiad.wmnet with OS buster

Cookbook cookbooks.sre.hosts.reimage started by jmm@cumin2002 for host ganeti1018.eqiad.wmnet with OS buster completed:

  • ganeti1018 (PASS)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh buster OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202201200950_jmm_3211404_ganeti1018.out
    • Checked BIOS boot parameters are back to normal
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage was started by jmm@cumin2002 for host ganeti1013.eqiad.wmnet with OS buster

Cookbook cookbooks.sre.hosts.reimage started by jmm@cumin2002 for host ganeti1013.eqiad.wmnet with OS buster completed:

  • ganeti1013 (PASS)
    • Downtimed on Icinga
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh buster OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202201250843_jmm_4048169_ganeti1013.out
    • Checked BIOS boot parameters are back to normal
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage was started by jmm@cumin2002 for host ganeti1014.eqiad.wmnet with OS buster

Cookbook cookbooks.sre.hosts.reimage started by jmm@cumin2002 for host ganeti1014.eqiad.wmnet with OS buster completed:

  • ganeti1014 (PASS)
    • Downtimed on Icinga
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh buster OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202201260829_jmm_21158_ganeti1014.out
    • Checked BIOS boot parameters are back to normal
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage was started by jmm@cumin2002 for host ganeti1005.eqiad.wmnet with OS buster

Cookbook cookbooks.sre.hosts.reimage started by jmm@cumin2002 for host ganeti1005.eqiad.wmnet with OS buster executed with errors:

  • ganeti1005 (FAIL)
    • Downtimed on Icinga
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • The reimage failed, see the cookbook logs for the details

Cookbook cookbooks.sre.hosts.reimage was started by jmm@cumin2002 for host ganeti1005.eqiad.wmnet with OS buster

Cookbook cookbooks.sre.hosts.reimage started by jmm@cumin2002 for host ganeti1005.eqiad.wmnet with OS buster completed:

  • ganeti1005 (PASS)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202201260932_jmm_30181_ganeti1005.out
    • Checked BIOS boot parameters are back to normal
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage was started by jmm@cumin2002 for host ganeti1006.eqiad.wmnet with OS buster

Cookbook cookbooks.sre.hosts.reimage started by jmm@cumin2002 for host ganeti1006.eqiad.wmnet with OS buster completed:

  • ganeti1006 (PASS)
    • Downtimed on Icinga
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh buster OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202201260957_jmm_34636_ganeti1006.out
    • Checked BIOS boot parameters are back to normal
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage was started by jmm@cumin2002 for host ganeti1007.eqiad.wmnet with OS buster

Cookbook cookbooks.sre.hosts.reimage started by jmm@cumin2002 for host ganeti1007.eqiad.wmnet with OS buster completed:

  • ganeti1007 (PASS)
    • Downtimed on Icinga
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh buster OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202201310921_jmm_855514_ganeti1007.out
    • Checked BIOS boot parameters are back to normal
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage was started by jmm@cumin2002 for host ganeti1015.eqiad.wmnet with OS buster

Cookbook cookbooks.sre.hosts.reimage started by jmm@cumin2002 for host ganeti1015.eqiad.wmnet with OS buster completed:

  • ganeti1015 (PASS)
    • Downtimed on Icinga
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh buster OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202201311001_jmm_860999_ganeti1015.out
    • Checked BIOS boot parameters are back to normal
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage was started by jmm@cumin2002 for host ganeti1010.eqiad.wmnet with OS buster

Cookbook cookbooks.sre.hosts.reimage started by jmm@cumin2002 for host ganeti1010.eqiad.wmnet with OS buster completed:

  • ganeti1010 (PASS)
    • Downtimed on Icinga
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh buster OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202202010957_jmm_1040637_ganeti1010.out
    • Checked BIOS boot parameters are back to normal
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage was started by jmm@cumin2002 for host ganeti1019.eqiad.wmnet with OS buster

Cookbook cookbooks.sre.hosts.reimage started by jmm@cumin2002 for host ganeti1019.eqiad.wmnet with OS buster completed:

  • ganeti1019 (PASS)
    • Downtimed on Icinga
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh buster OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202202020917_jmm_1219045_ganeti1019.out
    • Checked BIOS boot parameters are back to normal
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage was started by jmm@cumin2002 for host ganeti1008.eqiad.wmnet with OS buster

Cookbook cookbooks.sre.hosts.reimage started by jmm@cumin2002 for host ganeti1008.eqiad.wmnet with OS buster completed:

  • ganeti1008 (PASS)
    • Downtimed on Icinga
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh buster OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202202020952_jmm_1224361_ganeti1008.out
    • Checked BIOS boot parameters are back to normal
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Mentioned in SAL (#wikimedia-traffic) [2022-02-04T10:40:43Z] <moritzm> rebalancing row A in ganeti/eqiad, all nodes of that row are now running Buster T296721

Cookbook cookbooks.sre.hosts.reimage was started by jmm@cumin2002 for host ganeti1020.eqiad.wmnet with OS buster

Cookbook cookbooks.sre.hosts.reimage started by jmm@cumin2002 for host ganeti1020.eqiad.wmnet with OS buster completed:

  • ganeti1020 (PASS)
    • Downtimed on Icinga
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh buster OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202202041114_jmm_1577275_ganeti1020.out
    • Checked BIOS boot parameters are back to normal
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage was started by jmm@cumin2002 for host ganeti1011.eqiad.wmnet with OS buster

Cookbook cookbooks.sre.hosts.reimage started by jmm@cumin2002 for host ganeti1011.eqiad.wmnet with OS buster executed with errors:

  • ganeti1011 (FAIL)
    • Downtimed on Icinga
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • The reimage failed, see the cookbook logs for the details

Cookbook cookbooks.sre.hosts.reimage was started by jmm@cumin2002 for host ganeti1011.eqiad.wmnet with OS buster

Cookbook cookbooks.sre.hosts.reimage started by jmm@cumin2002 for host ganeti1011.eqiad.wmnet with OS buster executed with errors:

  • ganeti1011 (FAIL)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • The reimage failed, see the cookbook logs for the details

Cookbook cookbooks.sre.hosts.reimage was started by jmm@cumin2002 for host ganeti1021.eqiad.wmnet with OS buster

Cookbook cookbooks.sre.hosts.reimage started by jmm@cumin2002 for host ganeti1021.eqiad.wmnet with OS buster completed:

  • ganeti1021 (PASS)
    • Downtimed on Icinga
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh buster OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202202111011_jmm_2798436_ganeti1021.out
    • Checked BIOS boot parameters are back to normal
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage was started by jmm@cumin2002 for host ganeti1016.eqiad.wmnet with OS buster

Cookbook cookbooks.sre.hosts.reimage started by jmm@cumin2002 for host ganeti1016.eqiad.wmnet with OS buster completed:

  • ganeti1016 (PASS)
    • Downtimed on Icinga
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh buster OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202202111213_jmm_2813492_ganeti1016.out
    • Checked BIOS boot parameters are back to normal
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage was started by jmm@cumin2002 for host ganeti1011.eqiad.wmnet with OS buster

Cookbook cookbooks.sre.hosts.reimage started by jmm@cumin2002 for host ganeti1011.eqiad.wmnet with OS buster executed with errors:

  • ganeti1011 (FAIL)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • The reimage failed, see the cookbook logs for the details

Mentioned in SAL (#wikimedia-operations) [2022-02-16T14:17:52Z] <moritzm> failover the ganeti master to ganeti1024 T296721

Cookbook cookbooks.sre.hosts.reimage was started by jmm@cumin2002 for host ganeti1022.eqiad.wmnet with OS buster

Cookbook cookbooks.sre.hosts.reimage started by jmm@cumin2002 for host ganeti1022.eqiad.wmnet with OS buster completed:

  • ganeti1022 (PASS)
    • Downtimed on Icinga
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh buster OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202202180844_jmm_4001000_ganeti1022.out
    • Checked BIOS boot parameters are back to normal
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage was started by jmm@cumin2002 for host ganeti1017.eqiad.wmnet with OS buster

Cookbook cookbooks.sre.hosts.reimage started by jmm@cumin2002 for host ganeti1017.eqiad.wmnet with OS buster completed:

  • ganeti1017 (PASS)
    • Downtimed on Icinga
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh buster OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202202181127_jmm_4021290_ganeti1017.out
    • Checked BIOS boot parameters are back to normal
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage was started by jmm@cumin2002 for host ganeti1012.eqiad.wmnet with OS buster

Cookbook cookbooks.sre.hosts.reimage started by jmm@cumin2002 for host ganeti1012.eqiad.wmnet with OS buster completed:

  • ganeti1012 (PASS)
    • Downtimed on Icinga
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh buster OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202202181312_jmm_4035993_ganeti1012.out
    • Checked BIOS boot parameters are back to normal
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage was started by jmm@cumin2002 for host ganeti1009.eqiad.wmnet with OS buster

Cookbook cookbooks.sre.hosts.reimage started by jmm@cumin2002 for host ganeti1009.eqiad.wmnet with OS buster completed:

  • ganeti1009 (PASS)
    • Downtimed on Icinga
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh buster OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202202210849_jmm_322514_ganeti1009.out
    • Checked BIOS boot parameters are back to normal
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Mentioned in SAL (#wikimedia-operations) [2022-02-21T13:06:56Z] <moritzm> rebalance ganeti row_C (add nodes reimaged in there) T296721

Mentioned in SAL (#wikimedia-operations) [2022-02-22T13:24:10Z] <moritzm> rebalance ganeti eqiad row_D (all nodes reimaged in there) T296721

Mentioned in SAL (#wikimedia-operations) [2022-02-22T18:30:49Z] <moritzm> rebalance ganeti eqiad row_B (all nodes reimaged in there) T296721

MoritzMuehlenhoff claimed this task.

The update is complete