Page MenuHomePhabricator

Q4:rack/setup/install cp70[01-16]
Closed, ResolvedPublic

Description

This task will track the racking, setup, and OS installation of X

Racking Layout

Elevation Doc: https://docs.google.com/spreadsheets/d/1FiRfGo9wMXTvIcT5tIclQ2R1VM1X570M1kRdyKSHsms/edit?usp=sharing

cp7001:
  • Receive in system on procurement task T348480 & in Coupa
  • Rack system with proposed racking plan (see above) & update Elevation Doc for mass import of server info
  • Run the Provision a server's network attributes Netbox script - Note that you must run the DNS and Provision cookbook after completing this step
  • Immediately run the sre.dns.netbox cookbook
  • Immediately run the sre.hosts.provision cookbook
  • Run the sre.hardware.upgrade-firmware cookbook
  • Update the operations/puppet repo - this should include updates to preseed.yaml, and site.pp with roles defined by service group: https://wikitech.wikimedia.org/wiki/SRE/Dc-operations
  • Run the sre.hosts.reimage cookbook
cp7002:
  • Receive in system on procurement task T348480 & in Coupa
  • Rack system with proposed racking plan (see above) & update Elevation Doc for mass import of server info
  • Run the Provision a server's network attributes Netbox script - Note that you must run the DNS and Provision cookbook after completing this step
  • Immediately run the sre.dns.netbox cookbook
  • Immediately run the sre.hosts.provision cookbook
  • Run the sre.hardware.upgrade-firmware cookbook
  • Update the operations/puppet repo - this should include updates to preseed.yaml, and site.pp with roles defined by service group: https://wikitech.wikimedia.org/wiki/SRE/Dc-operations
  • Run the sre.hosts.reimage cookbook
cp7003:
  • Receive in system on procurement task T348480 & in Coupa
  • Rack system with proposed racking plan (see above) & update Elevation Doc for mass import of server info
  • Run the Provision a server's network attributes Netbox script - Note that you must run the DNS and Provision cookbook after completing this step
  • Immediately run the sre.dns.netbox cookbook
  • Immediately run the sre.hosts.provision cookbook
  • Run the sre.hardware.upgrade-firmware cookbook
  • Update the operations/puppet repo - this should include updates to preseed.yaml, and site.pp with roles defined by service group: https://wikitech.wikimedia.org/wiki/SRE/Dc-operations
  • Run the sre.hosts.reimage cookbook
cp7004:
  • Receive in system on procurement task T348480 & in Coupa
  • Rack system with proposed racking plan (see above) & update Elevation Doc for mass import of server info
  • Run the Provision a server's network attributes Netbox script - Note that you must run the DNS and Provision cookbook after completing this step
  • Immediately run the sre.dns.netbox cookbook
  • Immediately run the sre.hosts.provision cookbook
  • Run the sre.hardware.upgrade-firmware cookbook
  • Update the operations/puppet repo - this should include updates to preseed.yaml, and site.pp with roles defined by service group: https://wikitech.wikimedia.org/wiki/SRE/Dc-operations
  • Run the sre.hosts.reimage cookbook
cp7005:
  • Receive in system on procurement task T348480 & in Coupa
  • Rack system with proposed racking plan (see above) & update Elevation Doc for mass import of server info
  • Run the Provision a server's network attributes Netbox script - Note that you must run the DNS and Provision cookbook after completing this step
  • Immediately run the sre.dns.netbox cookbook
  • Immediately run the sre.hosts.provision cookbook
  • Run the sre.hardware.upgrade-firmware cookbook
  • Update the operations/puppet repo - this should include updates to preseed.yaml, and site.pp with roles defined by service group: https://wikitech.wikimedia.org/wiki/SRE/Dc-operations
  • Run the sre.hosts.reimage cookbook
cp7006:
  • Receive in system on procurement task T348480 & in Coupa
  • Rack system with proposed racking plan (see above) & update Elevation Doc for mass import of server info
  • Run the Provision a server's network attributes Netbox script - Note that you must run the DNS and Provision cookbook after completing this step
  • Immediately run the sre.dns.netbox cookbook
  • Immediately run the sre.hosts.provision cookbook
  • Run the sre.hardware.upgrade-firmware cookbook
  • Update the operations/puppet repo - this should include updates to preseed.yaml, and site.pp with roles defined by service group: https://wikitech.wikimedia.org/wiki/SRE/Dc-operations
  • Run the sre.hosts.reimage cookbook
cp7007:
  • Receive in system on procurement task T348480 & in Coupa
  • Rack system with proposed racking plan (see above) & update Elevation Doc for mass import of server info
  • Run the Provision a server's network attributes Netbox script - Note that you must run the DNS and Provision cookbook after completing this step
  • Immediately run the sre.dns.netbox cookbook
  • Immediately run the sre.hosts.provision cookbook
  • Run the sre.hardware.upgrade-firmware cookbook
  • Update the operations/puppet repo - this should include updates to preseed.yaml, and site.pp with roles defined by service group: https://wikitech.wikimedia.org/wiki/SRE/Dc-operations
  • Run the sre.hosts.reimage cookbook
cp7008:
  • Receive in system on procurement task T348480 & in Coupa
  • Rack system with proposed racking plan (see above) & update Elevation Doc for mass import of server info
  • Run the Provision a server's network attributes Netbox script - Note that you must run the DNS and Provision cookbook after completing this step
  • Immediately run the sre.dns.netbox cookbook
  • Immediately run the sre.hosts.provision cookbook
  • Run the sre.hardware.upgrade-firmware cookbook
  • Update the operations/puppet repo - this should include updates to preseed.yaml, and site.pp with roles defined by service group: https://wikitech.wikimedia.org/wiki/SRE/Dc-operations
  • Run the sre.hosts.reimage cookbook
cp7009:
  • Receive in system on procurement task T348480 & in Coupa
  • Rack system with proposed racking plan (see above) & update Elevation Doc for mass import of server info
  • Run the Provision a server's network attributes Netbox script - Note that you must run the DNS and Provision cookbook after completing this step
  • Immediately run the sre.dns.netbox cookbook
  • Immediately run the sre.hosts.provision cookbook
  • Run the sre.hardware.upgrade-firmware cookbook
  • Update the operations/puppet repo - this should include updates to preseed.yaml, and site.pp with roles defined by service group: https://wikitech.wikimedia.org/wiki/SRE/Dc-operations
  • Run the sre.hosts.reimage cookbook
cp7010:
  • Receive in system on procurement task T348480 & in Coupa
  • Rack system with proposed racking plan (see above) & update Elevation Doc for mass import of server info
  • Run the Provision a server's network attributes Netbox script - Note that you must run the DNS and Provision cookbook after completing this step
  • Immediately run the sre.dns.netbox cookbook
  • Immediately run the sre.hosts.provision cookbook
  • Run the sre.hardware.upgrade-firmware cookbook
  • Update the operations/puppet repo - this should include updates to preseed.yaml, and site.pp with roles defined by service group: https://wikitech.wikimedia.org/wiki/SRE/Dc-operations
  • Run the sre.hosts.reimage cookbook
cp7011:
  • Receive in system on procurement task T348480 & in Coupa
  • Rack system with proposed racking plan (see above) & update Elevation Doc for mass import of server info
  • Run the Provision a server's network attributes Netbox script - Note that you must run the DNS and Provision cookbook after completing this step
  • Immediately run the sre.dns.netbox cookbook
  • Immediately run the sre.hosts.provision cookbook
  • Run the sre.hardware.upgrade-firmware cookbook
  • Update the operations/puppet repo - this should include updates to preseed.yaml, and site.pp with roles defined by service group: https://wikitech.wikimedia.org/wiki/SRE/Dc-operations
  • Run the sre.hosts.reimage cookbook
cp7012:
  • Receive in system on procurement task T348480 & in Coupa
  • Rack system with proposed racking plan (see above) & update Elevation Doc for mass import of server info
  • Run the Provision a server's network attributes Netbox script - Note that you must run the DNS and Provision cookbook after completing this step
  • Immediately run the sre.dns.netbox cookbook
  • Immediately run the sre.hosts.provision cookbook
  • Run the sre.hardware.upgrade-firmware cookbook
  • Update the operations/puppet repo - this should include updates to preseed.yaml, and site.pp with roles defined by service group: https://wikitech.wikimedia.org/wiki/SRE/Dc-operations
  • Run the sre.hosts.reimage cookbook
cp7013:
  • Receive in system on procurement task T348480 & in Coupa
  • Rack system with proposed racking plan (see above) & update Elevation Doc for mass import of server info
  • Run the Provision a server's network attributes Netbox script - Note that you must run the DNS and Provision cookbook after completing this step
  • Immediately run the sre.dns.netbox cookbook
  • Immediately run the sre.hosts.provision cookbook
  • Run the sre.hardware.upgrade-firmware cookbook
  • Update the operations/puppet repo - this should include updates to preseed.yaml, and site.pp with roles defined by service group: https://wikitech.wikimedia.org/wiki/SRE/Dc-operations
  • Run the sre.hosts.reimage cookbook
cp7014:
  • Receive in system on procurement task T348480 & in Coupa
  • Rack system with proposed racking plan (see above) & update Elevation Doc for mass import of server info
  • Run the Provision a server's network attributes Netbox script - Note that you must run the DNS and Provision cookbook after completing this step
  • Immediately run the sre.dns.netbox cookbook
  • Immediately run the sre.hosts.provision cookbook
  • Run the sre.hardware.upgrade-firmware cookbook
  • Update the operations/puppet repo - this should include updates to preseed.yaml, and site.pp with roles defined by service group: https://wikitech.wikimedia.org/wiki/SRE/Dc-operations
  • Run the sre.hosts.reimage cookbook
cp7015:
  • Receive in system on procurement task T348480 & in Coupa
  • Rack system with proposed racking plan (see above) & update Elevation Doc for mass import of server info
  • Run the Provision a server's network attributes Netbox script - Note that you must run the DNS and Provision cookbook after completing this step
  • Immediately run the sre.dns.netbox cookbook
  • Immediately run the sre.hosts.provision cookbook
  • Run the sre.hardware.upgrade-firmware cookbook
  • Update the operations/puppet repo - this should include updates to preseed.yaml, and site.pp with roles defined by service group: https://wikitech.wikimedia.org/wiki/SRE/Dc-operations
  • Run the sre.hosts.reimage cookbook
cp7016:
  • Receive in system on procurement task T348480 & in Coupa
  • Rack system with proposed racking plan (see above) & update Elevation Doc for mass import of server info
  • Run the Provision a server's network attributes Netbox script - Note that you must run the DNS and Provision cookbook after completing this step
  • Immediately run the sre.dns.netbox cookbook
  • Immediately run the sre.hosts.provision cookbook
  • Run the sre.hardware.upgrade-firmware cookbook
  • Update the operations/puppet repo - this should include updates to preseed.yaml, and site.pp with roles defined by service group: https://wikitech.wikimedia.org/wiki/SRE/Dc-operations
  • Run the sre.hosts.reimage cookbook

Related Objects

StatusSubtypeAssignedTask
OpenNone
ResolvedRobH

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Change #1025430 merged by Fabfur:

[operations/puppet@production] site: adding cp7003.magru.wmnet for test insetup role

https://gerrit.wikimedia.org/r/1025430

Cookbook cookbooks.sre.hosts.reimage was started by fabfur@cumin1002 for host cp7003.magru.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage was started by sukhe@cumin1002 for host cp7002.magru.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by fabfur@cumin1002 for host cp7003.magru.wmnet with OS bullseye executed with errors:

  • cp7003 (FAIL)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • The reimage failed, see the cookbook logs for the details,You can also try typing "install-console" cp7003.magru.wmnet to get a root shellbut depending on the failure this may not work.

Cookbook cookbooks.sre.hosts.reimage was started by fabfur@cumin1002 for host cp7003.magru.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by sukhe@cumin1002 for host cp7002.magru.wmnet with OS bullseye completed:

  • cp7002 (WARN)
    • Downtimed on Icinga/Alertmanager
    • Unable to disable Puppet, the host may have been unreachable
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202404291931_sukhe_3107509_cp7002.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> active
    • The sre.puppet.sync-netbox-hiera cookbook was run successfully

Cookbook cookbooks.sre.hosts.reimage started by fabfur@cumin1002 for host cp7003.magru.wmnet with OS bullseye completed:

  • cp7003 (WARN)
    • Downtimed on Icinga/Alertmanager
    • Unable to disable Puppet, the host may have been unreachable
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202404292003_fabfur_3113705_cp7003.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> active
    • The sre.puppet.sync-netbox-hiera cookbook was run successfully

Change #1025482 had a related patch set uploaded (by Ssingh; author: Ssingh):

[operations/puppet@production] site.pp: add cp7004

https://gerrit.wikimedia.org/r/1025482

Change #1025482 merged by Ssingh:

[operations/puppet@production] site.pp: add cp7004

https://gerrit.wikimedia.org/r/1025482

Cookbook cookbooks.sre.hosts.reimage was started by sukhe@cumin1002 for host cp7004.magru.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by sukhe@cumin1002 for host cp7004.magru.wmnet with OS bullseye executed with errors:

  • cp7004 (FAIL)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • The reimage failed, see the cookbook logs for the details,You can also try typing "install-console" cp7004.magru.wmnet to get a root shellbut depending on the failure this may not work.
Function lookup() did not find a value for the name 'prometheus_nodes'

…in /srv/puppet_code/environments/production/modules/profile/manifests/firewall.pp, line: 21

lookup() continues to fail for this even though hieradata/magru.yaml exists.

OK, so I finally found why this is failing. For a reason that I don't fully understand, hieradata/magru/ directory actually needs to exist for the lookup() against hieradata/magru.yaml to work. See the commit that fixes this (and notice the commit message!):

https://gerrit.wikimedia.org/r/c/operations/puppet/+/1025495

With this commit:

sukhe@puppetserver1001:~$ sudo puppet lookup --compile --node "cp7004.magru.wmnet" --explain "prometheus_nodes"

returns

Hierarchy entry "expand_path site"
  Path "/srv/puppet_code/environments/production/hieradata/magru"
    Original path: "%{::site}"
    Found key: "prometheus_nodes" value: [
      "prometheus7001.magru.wmnet"
    ]

If we revert this commit:

sukhe@puppetserver1001:~$ sudo puppet lookup --compile --node "cp7004.magru.wmnet" --explain "prometheus_nodes"
Warning: Undefined variable '::_role'; 
   (file & line not available)
Warning: Scope(Class[Profile::Netbox::Host]): cp7004.magru.wmnet is unknown in Netbox
Warning: Scope(Class[Profile::Netbox::Host]): cp7004.magru.wmnet: no Netbox location found
Error: Could not run: Function lookup() did not find a value for the name 'prometheus_nodes' (file: /srv/puppet_code/environments/production/modules/profile/manifests/firewall.pp, line: 21)

The lookup() again fails.

So this means that this directory needs to exist for the lookup to work. Why was it working for the other sites? Because in all the other sites, they do have something under hieradata/{$::site} but not for magru.

Change #1025496 had a related patch set uploaded (by Ssingh; author: Ssingh):

[operations/puppet@production] magru: set hiera for trafficserver::backend::storage_elements

https://gerrit.wikimedia.org/r/1025496

Change #1025496 merged by Ssingh:

[operations/puppet@production] magru: set hiera for trafficserver::backend::storage_elements

https://gerrit.wikimedia.org/r/1025496

Cookbook cookbooks.sre.hosts.reimage was started by sukhe@cumin1002 for host cp7004.magru.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by sukhe@cumin1002 for host cp7004.magru.wmnet with OS bullseye completed:

  • cp7004 (WARN)
    • Downtimed on Icinga/Alertmanager
    • Unable to disable Puppet, the host may have been unreachable
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202404300155_sukhe_3159360_cp7004.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> active
    • The sre.puppet.sync-netbox-hiera cookbook was run successfully

Change #1025707 had a related patch set uploaded (by Fabfur; author: Fabfur):

[operations/puppet@production] site:magru: set definitive roles for cp hosts

https://gerrit.wikimedia.org/r/1025707

Change #1025718 had a related patch set uploaded (by Ssingh; author: Ssingh):

[operations/puppet@production] magru: add cp nodes text: cp700[1-8] and upload: cp70(09|1[0-6])

https://gerrit.wikimedia.org/r/1025718

Change #1025707 abandoned by Fabfur:

[operations/puppet@production] site:magru: set definitive roles for cp hosts

Reason:

superseded by 1025718

https://gerrit.wikimedia.org/r/1025707

Change #1025718 merged by Ssingh:

[operations/puppet@production] magru: add cp nodes text: cp700[1-8] and upload: cp70(09|1[0-6])

https://gerrit.wikimedia.org/r/1025718

Cookbook cookbooks.sre.hosts.reimage was started by sukhe@cumin1002 for host cp7009.magru.wmnet with OS bullseye

Change #1025780 had a related patch set uploaded (by Ssingh; author: Ssingh):

[operations/puppet@production] update service.yaml for text and upload clusters

https://gerrit.wikimedia.org/r/1025780

Cookbook cookbooks.sre.hosts.reimage started by sukhe@cumin1002 for host cp7009.magru.wmnet with OS bullseye executed with errors:

  • cp7009 (FAIL)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • The reimage failed, see the cookbook logs for the details,You can also try typing "install-console" cp7009.magru.wmnet to get a root shellbut depending on the failure this may not work.

Change #1025780 merged by Ssingh:

[operations/puppet@production] update service.yaml for text and upload clusters

https://gerrit.wikimedia.org/r/1025780

Cookbook cookbooks.sre.hosts.reimage was started by sukhe@cumin1002 for host cp7009.magru.wmnet with OS bullseye

Change #1025784 had a related patch set uploaded (by Fabfur; author: Fabfur):

[operations/puppet@production] hiera:magru: adding magru dc to authorized ncredir regex

https://gerrit.wikimedia.org/r/1025784

Cookbook cookbooks.sre.hosts.reimage was started by fabfur@cumin1002 for host cp7001.magru.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage was started by sukhe@cumin1002 for host cp7010.magru.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage was started by sukhe@cumin1002 for host cp7011.magru.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by sukhe@cumin1002 for host cp7009.magru.wmnet with OS bullseye completed:

  • cp7009 (WARN)
    • Downtimed on Icinga/Alertmanager
    • Unable to disable Puppet, the host may have been unreachable
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202404301410_sukhe_3450888_cp7009.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> active
    • The sre.puppet.sync-netbox-hiera cookbook was run successfully

Cookbook cookbooks.sre.hosts.reimage started by fabfur@cumin1002 for host cp7001.magru.wmnet with OS bullseye completed:

  • cp7001 (WARN)
    • Downtimed on Icinga/Alertmanager
    • Unable to disable Puppet, the host may have been unreachable
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202404301416_fabfur_3454966_cp7001.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status failed -> active
    • The sre.puppet.sync-netbox-hiera cookbook was run successfully

Cookbook cookbooks.sre.hosts.reimage started by sukhe@cumin1002 for host cp7010.magru.wmnet with OS bullseye completed:

  • cp7010 (PASS)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202404301434_sukhe_3476755_cp7010.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> active
    • The sre.puppet.sync-netbox-hiera cookbook was run successfully

Cookbook cookbooks.sre.hosts.reimage started by sukhe@cumin1002 for host cp7011.magru.wmnet with OS bullseye completed:

  • cp7011 (PASS)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202404301445_sukhe_3488606_cp7011.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> active
    • The sre.puppet.sync-netbox-hiera cookbook was run successfully

Cookbook cookbooks.sre.hosts.reimage was started by fabfur@cumin1002 for host cp7002.magru.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage was started by fabfur@cumin1002 for host cp7003.magru.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage was started by sukhe@cumin1002 for host cp7012.magru.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage was started by fabfur@cumin1002 for host cp7005.magru.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage was started by fabfur@cumin1002 for host cp7004.magru.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage was started by sukhe@cumin1002 for host cp7013.magru.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by fabfur@cumin1002 for host cp7003.magru.wmnet with OS bullseye executed with errors:

  • cp7003 (FAIL)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202404301612_fabfur_3545797_cp7003.out
    • The reimage failed, see the cookbook logs for the details,You can also try typing "install-console" cp7003.magru.wmnet to get a root shellbut depending on the failure this may not work.

Cookbook cookbooks.sre.hosts.reimage started by sukhe@cumin1002 for host cp7012.magru.wmnet with OS bullseye completed:

  • cp7012 (PASS)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202404301607_sukhe_3546216_cp7012.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> active
    • The sre.puppet.sync-netbox-hiera cookbook was run successfully

Cookbook cookbooks.sre.hosts.reimage started by fabfur@cumin1002 for host cp7002.magru.wmnet with OS bullseye completed:

  • cp7002 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202404301610_fabfur_3545755_cp7002.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage started by fabfur@cumin1002 for host cp7005.magru.wmnet with OS bullseye completed:

  • cp7005 (PASS)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202404301624_fabfur_3548462_cp7005.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> active
    • The sre.puppet.sync-netbox-hiera cookbook was run successfully

Cookbook cookbooks.sre.hosts.reimage started by sukhe@cumin1002 for host cp7013.magru.wmnet with OS bullseye completed:

  • cp7013 (WARN)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202404301630_sukhe_3548875_cp7013.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> active
    • Failed to run the sre.puppet.sync-netbox-hiera cookbook, run it manually

Cookbook cookbooks.sre.hosts.reimage started by fabfur@cumin1002 for host cp7004.magru.wmnet with OS bullseye completed:

  • cp7004 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202404301627_fabfur_3548416_cp7004.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage was started by sukhe@cumin1002 for host cp7014.magru.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage was started by sukhe@cumin1002 for host cp7015.magru.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by sukhe@cumin1002 for host cp7015.magru.wmnet with OS bullseye executed with errors:

  • cp7015 (FAIL)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • The reimage failed, see the cookbook logs for the details,You can also try typing "install-console" cp7015.magru.wmnet to get a root shellbut depending on the failure this may not work.

Cookbook cookbooks.sre.hosts.reimage started by sukhe@cumin1002 for host cp7014.magru.wmnet with OS bullseye executed with errors:

  • cp7014 (FAIL)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • The reimage failed, see the cookbook logs for the details,You can also try typing "install-console" cp7014.magru.wmnet to get a root shellbut depending on the failure this may not work.

Cookbook cookbooks.sre.hosts.reimage was started by sukhe@cumin1002 for host cp7014.magru.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by sukhe@cumin1002 for host cp7014.magru.wmnet with OS bullseye executed with errors:

  • cp7014 (FAIL)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • The reimage failed, see the cookbook logs for the details,You can also try typing "install-console" cp7014.magru.wmnet to get a root shellbut depending on the failure this may not work.

Cookbook cookbooks.sre.hosts.reimage was started by sukhe@cumin1002 for host cp7003.magru.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage was started by fabfur@cumin1002 for host cp7007.magru.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage was started by fabfur@cumin1002 for host cp7006.magru.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage was started by fabfur@cumin1002 for host cp7008.magru.wmnet with OS bullseye

Change #1025784 merged by BCornwall:

[operations/puppet@production] hiera:magru: adding magru dc to authorized ncredir regex

https://gerrit.wikimedia.org/r/1025784

Cookbook cookbooks.sre.hosts.reimage started by sukhe@cumin1002 for host cp7003.magru.wmnet with OS bullseye completed:

  • cp7003 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202404301904_sukhe_3587133_cp7003.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage started by fabfur@cumin1002 for host cp7007.magru.wmnet with OS bullseye completed:

  • cp7007 (PASS)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202404301914_fabfur_3588314_cp7007.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> active
    • The sre.puppet.sync-netbox-hiera cookbook was run successfully

Cookbook cookbooks.sre.hosts.reimage was started by sukhe@cumin1002 for host cp7016.magru.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by fabfur@cumin1002 for host cp7006.magru.wmnet with OS bullseye completed:

  • cp7006 (PASS)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202404301916_fabfur_3588303_cp7006.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> active
    • The sre.puppet.sync-netbox-hiera cookbook was run successfully

Cookbook cookbooks.sre.hosts.reimage started by fabfur@cumin1002 for host cp7008.magru.wmnet with OS bullseye completed:

  • cp7008 (PASS)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202404301919_fabfur_3588680_cp7008.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> active
    • The sre.puppet.sync-netbox-hiera cookbook was run successfully

Cookbook cookbooks.sre.hosts.reimage was started by sukhe@cumin1002 for host cp7015.magru.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by sukhe@cumin1002 for host cp7016.magru.wmnet with OS bullseye completed:

  • cp7016 (PASS)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202404302013_sukhe_3606929_cp7016.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> active
    • The sre.puppet.sync-netbox-hiera cookbook was run successfully

Cookbook cookbooks.sre.hosts.reimage started by sukhe@cumin1002 for host cp7015.magru.wmnet with OS bullseye completed:

  • cp7015 (PASS)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202404302031_sukhe_3610807_cp7015.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> active
    • The sre.puppet.sync-netbox-hiera cookbook was run successfully

Cookbook cookbooks.sre.hosts.reimage was started by fabfur@cumin1002 for host cp7014.magru.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage was started by fabfur@cumin1002 for host cp7013.magru.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by fabfur@cumin1002 for host cp7013.magru.wmnet with OS bullseye completed:

  • cp7013 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202404302235_fabfur_3635263_cp7013.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage started by fabfur@cumin1002 for host cp7014.magru.wmnet with OS bullseye completed:

  • cp7014 (PASS)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202404302233_fabfur_3635280_cp7014.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> active
    • The sre.puppet.sync-netbox-hiera cookbook was run successfully

Change #1026673 had a related patch set uploaded (by BCornwall; author: BCornwall):

[operations/puppet@production] magru: add ncredir7001 and ncredir7002 nodes

https://gerrit.wikimedia.org/r/1026673

Change #1026674 had a related patch set uploaded (by BCornwall; author: BCornwall):

[operations/puppet@production] lvs: Add ncredir7001/ncredir7002 (service_setup)

https://gerrit.wikimedia.org/r/1026674

Change #1026674 abandoned by BCornwall:

[operations/puppet@production] lvs: Add ncredir7001/ncredir7002 (service_setup)

Reason:

I0df1744ee10a768c32067cee1b4f20a639d7d6cb already addresses this

https://gerrit.wikimedia.org/r/1026674

Change #1026673 abandoned by BCornwall:

[operations/puppet@production] magru: add ncredir7001 and ncredir7002 nodes

Reason:

I0df1744ee10a768c32067cee1b4f20a639d7d6cb already addresses this

https://gerrit.wikimedia.org/r/1026673

RobH claimed this task.