Page MenuHomePhabricator

Replace cloudgw2001-dev with cloudgw2003-dev
Closed, ResolvedPublic

Description

The host cloudgw2003-dev has been racked and bootstrapped but it need to really be put into service in order to replace cloudgw2001-dev.

Also, make sure we use a single NIC setup.

Event Timeline

Change 838125 had a related patch set uploaded (by Arturo Borrero Gonzalez; author: Arturo Borrero Gonzalez):

[operations/puppet@production] cloudgw2003-dev: give proper role

https://gerrit.wikimedia.org/r/838125

Change 850116 had a related patch set uploaded (by Arturo Borrero Gonzalez; author: Arturo Borrero Gonzalez):

[operations/dns@master] wikimediacloud.org: refresh cloudgw server

https://gerrit.wikimedia.org/r/850116

Change 850116 merged by Arturo Borrero Gonzalez:

[operations/dns@master] wikimediacloud.org: refresh cloudgw server

https://gerrit.wikimedia.org/r/850116

Change 838125 merged by Arturo Borrero Gonzalez:

[operations/puppet@production] cloudgw2003-dev: give proper role and take over cloudgw2001-dev

https://gerrit.wikimedia.org/r/838125

Cookbook cookbooks.sre.hosts.reimage was started by aborrero@cumin2002 for host cloudgw2001-dev.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage was started by aborrero@cumin2002 for host cloudgw2003-dev.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by aborrero@cumin2002 for host cloudgw2001-dev.codfw.wmnet with OS bullseye completed:

  • cloudgw2001-dev (WARN)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Unable to downtime the new host on Icinga/Alertmanager, the sre.hosts.downtime cookbook returned 99
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202210271424_aborrero_3553078_cloudgw2001-dev.out
    • Checked BIOS boot parameters are back to normal
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Change 850190 had a related patch set uploaded (by Arturo Borrero Gonzalez; author: Arturo Borrero Gonzalez):

[operations/puppet@production] cloudgw: don't configure anything on base dataplace interface

https://gerrit.wikimedia.org/r/850190

Cookbook cookbooks.sre.hosts.reimage started by aborrero@cumin2002 for host cloudgw2003-dev.codfw.wmnet with OS bullseye completed:

  • cloudgw2003-dev (WARN)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202210271425_aborrero_3553938_cloudgw2003-dev.out
    • Checked BIOS boot parameters are back to normal
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB

Change 850190 merged by Arturo Borrero Gonzalez:

[operations/puppet@production] cloudgw: don't configure anything on base dataplace interface

https://gerrit.wikimedia.org/r/850190

Change 850195 had a related patch set uploaded (by Arturo Borrero Gonzalez; author: Arturo Borrero Gonzalez):

[operations/puppet@production] cloudgw: codfw1dev: don't hardcode interface names

https://gerrit.wikimedia.org/r/850195

Change 850195 merged by Arturo Borrero Gonzalez:

[operations/puppet@production] cloudgw: codfw1dev: don't hardcode interface names

https://gerrit.wikimedia.org/r/850195