Page MenuHomePhabricator

Q3:rack/setup/install wdqs20[13-22]
Closed, ResolvedPublic

Description

This task will track the racking, setup, and OS installation of wdqs2[13-22]

Hostname / Racking / Installation Details

Hostnames: wdqs20[13-22]
Racking Proposal: See Racking Proposal section 5 lines down
Networking Setup: 10G, single port, production VLAN
Partitioning/Raid: S/W RAID10 - partman: raid10-4dev.cfg
OS Distro: Bullseye
Sub-team Technical Contact: @RKemper and @bking primarily; @Gehel as a fallback contact

Racking Proposal

Simply put, for simplicity's sake we prefer to stick refresh hosts in the same rows as their replacement, and have the remaining net-new hosts distributed more or less randomly between the available rows.

wdqs201[3-5] will replace wdqs200[4-6]. wdqs20[16-22] will be net-new hosts.

Actual racking details

wdqs2013 - Row A
wdqs2014 - Row A
wdqs2015 - Row B

The following should be roughly distributed between rows A-D, with the caveat that we want
to make sure these hosts are making it into 10G racks.

Feel free to assign the rows however best satisfies the requirements.

wdqs2016 - Row ?
wdqs2017 - Row ?
wdqs2018 - Row ?
wdqs2019 - Row ?
wdqs2020 - Row ?
wdqs2021 - Row ?
wdqs2022 - Row ?

Per host setup checklist

Each host should have its own setup checklist copied and pasted into the list below.

wdqs2013: Rack A2 - U11 - Port 10
  • - receive in system on procurement task T325203 & in coupa
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - add mgmt dns (asset tag and hostname) and production dns entries in netbox, run cookbook sre.dns.netbox.
  • - network port setup via netbox, run homer from an active cumin host to commit
  • - bios/drac/serial setup/testing, see Lifecycle Steps & Automatic BIOS setup details
  • - firmware update (idrac, bios, network, raid controller)
  • - operations/puppet update - this should include updates to netboot.pp, and site.pp role::insetup::search_platform.
  • - OS installation & initital puppet run via sre.hosts.reimage cookbook.
wdqs2014: Rack B2 - U11 - Port 10
  • - receive in system on procurement task T325203 & in coupa
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - add mgmt dns (asset tag and hostname) and production dns entries in netbox, run cookbook sre.dns.netbox.
  • - network port setup via netbox, run homer from an active cumin host to commit
  • - bios/drac/serial setup/testing, see Lifecycle Steps & Automatic BIOS setup details
  • - firmware update (idrac, bios, network, raid controller)
  • - operations/puppet update - this should include updates to netboot.pp, and site.pp role::insetup::search_platform.
  • - OS installation & initital puppet run via sre.hosts.reimage cookbook.
wdqs2015: Rack D2 - U7 - Port 6
  • - receive in system on procurement task T325203 & in coupa
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - add mgmt dns (asset tag and hostname) and production dns entries in netbox, run cookbook sre.dns.netbox.
  • - network port setup via netbox, run homer from an active cumin host to commit
  • - bios/drac/serial setup/testing, see Lifecycle Steps & Automatic BIOS setup details
  • - firmware update (idrac, bios, network, raid controller)
  • - operations/puppet update - this should include updates to netboot.pp, and site.pp role::insetup::search_platform.
  • - OS installation & initital puppet run via sre.hosts.reimage cookbook.
wdqs2016: Rack B4 - U13 - Port 12
  • - receive in system on procurement task T325203 & in coupa
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - add mgmt dns (asset tag and hostname) and production dns entries in netbox, run cookbook sre.dns.netbox.
  • - network port setup via netbox, run homer from an active cumin host to commit
  • - bios/drac/serial setup/testing, see Lifecycle Steps & Automatic BIOS setup details
  • - firmware update (idrac, bios, network, raid controller)
  • - operations/puppet update - this should include updates to netboot.pp, and site.pp role::insetup::search_platform.
  • - OS installation & initital puppet run via sre.hosts.reimage cookbook.
wdqs2017: Rack C2 - U10 - Port 9
  • - receive in system on procurement task T325203 & in coupa
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - add mgmt dns (asset tag and hostname) and production dns entries in netbox, run cookbook sre.dns.netbox.
  • - network port setup via netbox, run homer from an active cumin host to commit
  • - bios/drac/serial setup/testing, see Lifecycle Steps & Automatic BIOS setup details
  • - firmware update (idrac, bios, network, raid controller)
  • - operations/puppet update - this should include updates to netboot.pp, and site.pp role::insetup::search_platform.
  • - OS installation & initital puppet run via sre.hosts.reimage cookbook.
wdqs2018: Rack: C4 - U29 - Port 28
  • - receive in system on procurement task T325203 & in coupa
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - add mgmt dns (asset tag and hostname) and production dns entries in netbox, run cookbook sre.dns.netbox.
  • - network port setup via netbox, run homer from an active cumin host to commit
  • - bios/drac/serial setup/testing, see Lifecycle Steps & Automatic BIOS setup details
  • - firmware update (idrac, bios, network, raid controller)
  • - operations/puppet update - this should include updates to netboot.pp, and site.pp role::insetup::search_platform.
  • - OS installation & initital puppet run via sre.hosts.reimage cookbook.
wdqs2019: Rack: C7 - U6 - Port 5
  • - receive in system on procurement task T325203 & in coupa
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - add mgmt dns (asset tag and hostname) and production dns entries in netbox, run cookbook sre.dns.netbox.
  • - network port setup via netbox, run homer from an active cumin host to commit
  • - bios/drac/serial setup/testing, see Lifecycle Steps & Automatic BIOS setup details
  • - firmware update (idrac, bios, network, raid controller)
  • - operations/puppet update - this should include updates to netboot.pp, and site.pp role::insetup::search_platform.
  • - OS installation & initital puppet run via sre.hosts.reimage cookbook.
wdqs2020: Rack: A7 - U33 - Port 33
  • - receive in system on procurement task T325203 & in coupa
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - add mgmt dns (asset tag and hostname) and production dns entries in netbox, run cookbook sre.dns.netbox.
  • - network port setup via netbox, run homer from an active cumin host to commit
  • - bios/drac/serial setup/testing, see Lifecycle Steps & Automatic BIOS setup details
  • - firmware update (idrac, bios, network, raid controller)
  • - operations/puppet update - this should include updates to netboot.pp, and site.pp role::insetup::search_platform.
  • - OS installation & initital puppet run via sre.hosts.reimage cookbook.
wdqs2021: Rack:D4 - U33 - port 32
  • - receive in system on procurement task T325203 & in coupa
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - add mgmt dns (asset tag and hostname) and production dns entries in netbox, run cookbook sre.dns.netbox.
  • - network port setup via netbox, run homer from an active cumin host to commit
  • - bios/drac/serial setup/testing, see Lifecycle Steps & Automatic BIOS setup details
  • - firmware update (idrac, bios, network, raid controller)
  • - operations/puppet update - this should include updates to netboot.pp, and site.pp role::insetup::search_platform.
  • - OS installation & initital puppet run via sre.hosts.reimage cookbook.
wdqs2022: Rack: D7 - U34 - port 33
  • - receive in system on procurement task T325203 & in coupa
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - add mgmt dns (asset tag and hostname) and production dns entries in netbox, run cookbook sre.dns.netbox.
  • - network port setup via netbox, run homer from an active cumin host to commit
  • - bios/drac/serial setup/testing, see Lifecycle Steps & Automatic BIOS setup details
  • - firmware update (idrac, bios, network, raid controller)
  • - operations/puppet update - this should include updates to netboot.pp, and site.pp role::insetup::search_platform.
  • - OS installation & initital puppet run via sre.hosts.reimage cookbook.

Event Timeline

RobH updated the task description. (Show Details)
RobH updated the task description. (Show Details)
RobH added a parent task: Unknown Object (Task).
RobH added subscribers: bking, Gehel.
RobH unsubscribed.

@Jhancock.wm the racking proposal in the description of this task says:

wdqs201[3-5] will replace wdqs200[4-6]. wdqs20[16-22] will be net-new hosts.
Actual racking details

wdqs2013 - Row A
wdqs2014 - Row A
wdqs2015 - Row B

or if wdqs201[3-5] will replace wdqs200[4-6]. the racking should be

wdqs2013 - Row A
wdqs2014 - Row B
wdqs2015 - Row D

because

wdqs2004 - Row A
wdqs2005 - Row B
wdqs2006 - Row D

Let me know if you have any questions. Thanks
Jhancock.wm updated the task description. (Show Details)
Jhancock.wm subscribed.

Change 891651 had a related patch set uploaded (by Papaul; author: Papaul):

[operations/puppet@production] Add new wdqs node to site.pp and netboot.cfg

https://gerrit.wikimedia.org/r/891651

Change 891651 merged by Papaul:

[operations/puppet@production] Add new wdqs node to site.pp and netboot.cfg

https://gerrit.wikimedia.org/r/891651

Cookbook cookbooks.sre.hosts.reimage was started by pt1979@cumin2002 for host wdqs2013.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by pt1979@cumin2002 for host wdqs2013.codfw.wmnet with OS bullseye completed:

  • wdqs2013 (PASS)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202302232142_pt1979_700597_wdqs2013.out
    • Checked BIOS boot parameters are back to normal
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> active

Cookbook cookbooks.sre.hosts.reimage was started by pt1979@cumin2002 for host wdqs2014.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage was started by pt1979@cumin2002 for host wdqs2015.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage was started by pt1979@cumin2002 for host wdqs2016.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage was started by pt1979@cumin2002 for host wdqs2017.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by pt1979@cumin2002 for host wdqs2016.codfw.wmnet with OS bullseye completed:

  • wdqs2016 (PASS)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202302240109_pt1979_742239_wdqs2016.out
    • Checked BIOS boot parameters are back to normal
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> active

Cookbook cookbooks.sre.hosts.reimage started by pt1979@cumin2002 for host wdqs2014.codfw.wmnet with OS bullseye completed:

  • wdqs2014 (PASS)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202302240103_pt1979_741497_wdqs2014.out
    • Checked BIOS boot parameters are back to normal
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> active

@Jhancock.wm when you are back on site can you please check wdqs2015 it looks like i have no network cable connected to it.

Thanks

Cookbook cookbooks.sre.hosts.reimage started by pt1979@cumin2002 for host wdqs2017.codfw.wmnet with OS bullseye completed:

  • wdqs2017 (PASS)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202302240113_pt1979_742727_wdqs2017.out
    • Checked BIOS boot parameters are back to normal
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> active

Cookbook cookbooks.sre.hosts.reimage was started by pt1979@cumin2002 for host wdqs2018.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by pt1979@cumin2002 for host wdqs2015.codfw.wmnet with OS bullseye executed with errors:

  • wdqs2015 (FAIL)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • The reimage failed, see the cookbook logs for the details

Cookbook cookbooks.sre.hosts.reimage was started by pt1979@cumin2002 for host wdqs2019.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by pt1979@cumin2002 for host wdqs2018.codfw.wmnet with OS bullseye executed with errors:

  • wdqs2018 (FAIL)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • The reimage failed, see the cookbook logs for the details

Cookbook cookbooks.sre.hosts.reimage was started by pt1979@cumin2002 for host wdqs2018.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage was started by pt1979@cumin2002 for host wdqs2020.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by pt1979@cumin2002 for host wdqs2020.codfw.wmnet with OS bullseye executed with errors:

  • wdqs2020 (FAIL)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • The reimage failed, see the cookbook logs for the details

Cookbook cookbooks.sre.hosts.reimage was started by pt1979@cumin2002 for host wdqs2020.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage was started by pt1979@cumin2002 for host wdqs2021.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by pt1979@cumin2002 for host wdqs2018.codfw.wmnet with OS bullseye completed:

  • wdqs2018 (PASS)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202302240218_pt1979_759912_wdqs2018.out
    • Checked BIOS boot parameters are back to normal
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> active

Cookbook cookbooks.sre.hosts.reimage started by pt1979@cumin2002 for host wdqs2019.codfw.wmnet with OS bullseye completed:

  • wdqs2019 (PASS)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202302240205_pt1979_758298_wdqs2019.out
    • Checked BIOS boot parameters are back to normal
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> active

Cookbook cookbooks.sre.hosts.reimage started by pt1979@cumin2002 for host wdqs2020.codfw.wmnet with OS bullseye executed with errors:

  • wdqs2020 (FAIL)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • The reimage failed, see the cookbook logs for the details

Cookbook cookbooks.sre.hosts.reimage started by pt1979@cumin2002 for host wdqs2021.codfw.wmnet with OS bullseye completed:

  • wdqs2021 (PASS)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202302240307_pt1979_772719_wdqs2021.out
    • Checked BIOS boot parameters are back to normal
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> active

Cookbook cookbooks.sre.hosts.reimage was started by pt1979@cumin2002 for host wdqs2020.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by pt1979@cumin2002 for host wdqs2020.codfw.wmnet with OS bullseye completed:

  • wdqs2020 (PASS)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202302241431_pt1979_907116_wdqs2020.out
    • Checked BIOS boot parameters are back to normal
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> active

Cookbook cookbooks.sre.hosts.reimage was started by pt1979@cumin2002 for host wdqs2022.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by pt1979@cumin2002 for host wdqs2022.codfw.wmnet with OS bullseye executed with errors:

  • wdqs2022 (FAIL)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • The reimage failed, see the cookbook logs for the details

@Jhancock.wm can you also check the network cable on wdqs2022.

@Papaul network cable was reseated and showing as connected now on wdqs2022.

@Jhancock.wm thank you I have also wdqs2015 see my comment on the 23rd. Thanks

Cookbook cookbooks.sre.hosts.reimage was started by pt1979@cumin2002 for host wdqs2015.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage was started by pt1979@cumin2002 for host wdqs2022.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by pt1979@cumin2002 for host wdqs2015.codfw.wmnet with OS bullseye completed:

  • wdqs2015 (PASS)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202302271541_pt1979_1742800_wdqs2015.out
    • Checked BIOS boot parameters are back to normal
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> active

Cookbook cookbooks.sre.hosts.reimage started by pt1979@cumin2002 for host wdqs2022.codfw.wmnet with OS bullseye completed:

  • wdqs2022 (PASS)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202302271552_pt1979_1744157_wdqs2022.out
    • Checked BIOS boot parameters are back to normal
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> active
Papaul updated the task description. (Show Details)

complete
@bking @Gehel all yours