Page MenuHomePhabricator

Q2:rack/setup/install es204[1-6]
Closed, ResolvedPublic

Description

This task will track the racking, setup, and OS installation of es204[1-6]

Hostname / Racking / Installation Details

Hostnames: es204[1-6]
Racking Proposal: Where should these systems be racked? Can they share with any existing systems or should they avoid any other systems sharing their rack or row? (Note EQIAD now has rows A-F.)
es2020 → will be replaced by es2041, server can be installed in A2
es2021 → will be replaced by es2042, server can be installed in B3
es2022 → will be replaced by es2043 server can be installed in C7
es2023 → will be replaced by es2044, server can be installed in D3
es2024 → will be replaced by es2045 server can be installed in A7
es2025 → will be replaced by es2046 server can be installed in B7
Networking Setup: # of Connections:1/2 - VLAN:Private/Public/Other(Specify) :
OS Distro: Bookworm (default unless otherwise specified)
Sub-team Technical Contact: @ABran-WMF

Per host setup checklist

Each host should have its own setup checklist copied and pasted into the list below.

es2041
  • Receive in system on procurement task T376156 & in Coupa
  • Rack system with proposed racking plan (see above) & update Netbox (include all system info plus location, state of planned)
  • Run the Provision a server's network attributes Netbox script - Note that you must run the DNS and Provision cookbook after completing this step
  • Immediately run the sre.dns.netbox cookbook
  • Immediately run the sre.hosts.provision cookbook
  • Run the sre.hardware.upgrade-firmware cookbook
  • Update the operations/puppet repo - this should include updates to preseed.yaml, and site.pp with roles defined by service group: https://wikitech.wikimedia.org/wiki/SRE/Dc-operations
  • Run the sre.hosts.reimage cookbook
es2042
  • Receive in system on procurement task T376156 & in Coupa
  • Rack system with proposed racking plan (see above) & update Netbox (include all system info plus location, state of planned)
  • Run the Provision a server's network attributes Netbox script - Note that you must run the DNS and Provision cookbook after completing this step
  • Immediately run the sre.dns.netbox cookbook
  • Immediately run the sre.hosts.provision cookbook
  • Run the sre.hardware.upgrade-firmware cookbook
  • Update the operations/puppet repo - this should include updates to preseed.yaml, and site.pp with roles defined by service group: https://wikitech.wikimedia.org/wiki/SRE/Dc-operations
  • Run the sre.hosts.reimage cookbook
es2043
  • Receive in system on procurement task T376156 & in Coupa
  • Rack system with proposed racking plan (see above) & update Netbox (include all system info plus location, state of planned)
  • Run the Provision a server's network attributes Netbox script - Note that you must run the DNS and Provision cookbook after completing this step
  • Immediately run the sre.dns.netbox cookbook
  • Immediately run the sre.hosts.provision cookbook
  • Run the sre.hardware.upgrade-firmware cookbook
  • Update the operations/puppet repo - this should include updates to preseed.yaml, and site.pp with roles defined by service group: https://wikitech.wikimedia.org/wiki/SRE/Dc-operations
  • Run the sre.hosts.reimage cookbook
es2044
  • Receive in system on procurement task T376156 & in Coupa
  • Rack system with proposed racking plan (see above) & update Netbox (include all system info plus location, state of planned)
  • Run the Provision a server's network attributes Netbox script - Note that you must run the DNS and Provision cookbook after completing this step
  • Immediately run the sre.dns.netbox cookbook
  • Immediately run the sre.hosts.provision cookbook
  • Run the sre.hardware.upgrade-firmware cookbook
  • Update the operations/puppet repo - this should include updates to preseed.yaml, and site.pp with roles defined by service group: https://wikitech.wikimedia.org/wiki/SRE/Dc-operations
  • Run the sre.hosts.reimage cookbook
es2045
  • Receive in system on procurement task T376156 & in Coupa
  • Rack system with proposed racking plan (see above) & update Netbox (include all system info plus location, state of planned)
  • Run the Provision a server's network attributes Netbox script - Note that you must run the DNS and Provision cookbook after completing this step
  • Immediately run the sre.dns.netbox cookbook
  • Immediately run the sre.hosts.provision cookbook
  • Run the sre.hardware.upgrade-firmware cookbook
  • Update the operations/puppet repo - this should include updates to preseed.yaml, and site.pp with roles defined by service group: https://wikitech.wikimedia.org/wiki/SRE/Dc-operations
  • Run the sre.hosts.reimage cookbook
es2046
  • Receive in system on procurement task T376156 & in Coupa
  • Rack system with proposed racking plan (see above) & update Netbox (include all system info plus location, state of planned)
  • Run the Provision a server's network attributes Netbox script - Note that you must run the DNS and Provision cookbook after completing this step
  • Immediately run the sre.dns.netbox cookbook
  • Immediately run the sre.hosts.provision cookbook
  • Run the sre.hardware.upgrade-firmware cookbook
  • Update the operations/puppet repo - this should include updates to preseed.yaml, and site.pp with roles defined by service group: https://wikitech.wikimedia.org/wiki/SRE/Dc-operations
  • Run the sre.hosts.reimage cookbook

Event Timeline

RobH mentioned this in Unknown Object (Task).
RobH added a parent task: Unknown Object (Task).
RobH moved this task from Backlog to Racking Tasks on the ops-codfw board.
RobH unsubscribed.

@ABran-WMF,

Please note the workflow for racking tasks has changed this fiscal year, and we now require the puppet updates from the sub-team receiving the new servers. This is due to the majority of DC Ops not having root/merge puppet rights.

Please update the site.pp file with the insetup role for your team (detailed on https://wikitech.wikimedia.org/wiki/SRE/Dc-operations) and add the new servers to preseed.yml for partition info.

If possible, please reference this task number in your patch set, so it is clear when complete. Once complete, just un-assign yourself (leaving no assignee) for this task and once the hardware arrives on-sites will claim this task for racking and setup.

Thank you!

Change #1083758 had a related patch set uploaded (by Arnaudb; author: Arnaudb):

[operations/puppet@production] mariadb: add 12 new es hosts

https://gerrit.wikimedia.org/r/1083758

es2022 → will be replaced by es2043, server can be installed in C6: There is already one es in C6, better to look for a different rack,
es2024 → will be replaced by es2045, server can be installed in A6: Same as above
es2025 → will be replaced by es2046, server can be installed in B8: Same as above

@Marostegui this should make it diverse. lmk if you want something different.
es2043 > C7
es2045 > A7
es2046 > B7

@ABran-WMF we've received these servers. Please update the site.pp file. trying to get these up by end of week. next week at the latest.

Change #1087866 had a related patch set uploaded (by Arnaudb; author: Arnaudb):

[operations/puppet@production] mariadb: add es204[1-6]

https://gerrit.wikimedia.org/r/1087866

@ABran-WMF we've received these servers. Please update the site.pp file. trying to get these up by end of week. next week at the latest.

done, but given that's my first install task, I'll wait for CR approval to come!

Change #1087866 merged by Arnaudb:

[operations/puppet@production] mariadb: add es204[1-6]

https://gerrit.wikimedia.org/r/1087866

done, but given that's my first install task, I'll wait for CR approval to come!

done!

@Marostegui this should make it diverse. lmk if you want something different.
es2043 > C7
es2045 > A7
es2046 > B7

@ABran-WMF please verify and comment if needed. Thanks.

es2043 > C7
es2045 > A7
es2046 > B7

@ABran-WMF please verify and comment if needed. Thanks.

all 3 (with link attached ↑) are looking OK, thanks @Jhancock.wm

I would suggest to update the task so this new racking proposal is clearer.

A3 was full. I racked that one in A2 and updated the task description

Cookbook cookbooks.sre.hosts.reimage was started by jhancock@cumin2002 for host es2041.codfw.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage was started by jhancock@cumin2002 for host es2042.codfw.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage was started by jhancock@cumin2002 for host es2043.codfw.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage was started by jhancock@cumin2002 for host es2044.codfw.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage was started by jhancock@cumin2002 for host es2045.codfw.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage was started by jhancock@cumin2002 for host es2046.codfw.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage started by jhancock@cumin2002 for host es2042.codfw.wmnet with OS bookworm executed with errors:

  • es2042 (FAIL)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • The reimage failed, see the cookbook logs for the details. You can also try typing "sudo install-console es2042.codfw.wmnet" to get a root shell, but depending on the failure this may not work.

Cookbook cookbooks.sre.hosts.reimage started by jhancock@cumin2002 for host es2041.codfw.wmnet with OS bookworm executed with errors:

  • es2041 (FAIL)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • The reimage failed, see the cookbook logs for the details. You can also try typing "sudo install-console es2041.codfw.wmnet" to get a root shell, but depending on the failure this may not work.

Cookbook cookbooks.sre.hosts.reimage started by jhancock@cumin2002 for host es2043.codfw.wmnet with OS bookworm executed with errors:

  • es2043 (FAIL)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • The reimage failed, see the cookbook logs for the details. You can also try typing "sudo install-console es2043.codfw.wmnet" to get a root shell, but depending on the failure this may not work.

Cookbook cookbooks.sre.hosts.reimage started by jhancock@cumin2002 for host es2046.codfw.wmnet with OS bookworm executed with errors:

  • es2046 (FAIL)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • The reimage failed, see the cookbook logs for the details. You can also try typing "sudo install-console es2046.codfw.wmnet" to get a root shell, but depending on the failure this may not work.

Cookbook cookbooks.sre.hosts.reimage started by jhancock@cumin2002 for host es2044.codfw.wmnet with OS bookworm executed with errors:

  • es2044 (FAIL)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • The reimage failed, see the cookbook logs for the details. You can also try typing "sudo install-console es2044.codfw.wmnet" to get a root shell, but depending on the failure this may not work.

Cookbook cookbooks.sre.hosts.reimage started by jhancock@cumin2002 for host es2045.codfw.wmnet with OS bookworm executed with errors:

  • es2045 (FAIL)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • The reimage failed, see the cookbook logs for the details. You can also try typing "sudo install-console es2045.codfw.wmnet" to get a root shell, but depending on the failure this may not work.

Cookbook cookbooks.sre.hosts.reimage was started by jhancock@cumin2002 for host es2041.codfw.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage started by jhancock@cumin2002 for host es2041.codfw.wmnet with OS bookworm executed with errors:

  • es2041 (FAIL)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • The reimage failed, see the cookbook logs for the details. You can also try typing "sudo install-console es2041.codfw.wmnet" to get a root shell, but depending on the failure this may not work.

Cookbook cookbooks.sre.hosts.reimage was started by jhancock@cumin2002 for host es2041.codfw.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage started by jhancock@cumin2002 for host es2041.codfw.wmnet with OS bookworm executed with errors:

  • es2041 (FAIL)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Generated Puppet certificate
    • The reimage failed, see the cookbook logs for the details. You can also try typing "sudo install-console es2041.codfw.wmnet" to get a root shell, but depending on the failure this may not work.

Cookbook cookbooks.sre.hosts.reimage was started by jhancock@cumin2002 for host es2041.codfw.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage started by jhancock@cumin2002 for host es2041.codfw.wmnet with OS bookworm executed with errors:

  • es2041 (FAIL)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Generated Puppet certificate
    • The reimage failed, see the cookbook logs for the details. You can also try typing "sudo install-console es2041.codfw.wmnet" to get a root shell, but depending on the failure this may not work.

@Papaul this one server fails after the puppet certificate is generated. Can you take a look. I'll check out the other servers this afternoon.
es2041

@Jhancock.wm it looks like again the server did send the puppet cert request to the wrong puppet server. it supposed to send the request to puppetserver1001 but did send it to puppetmaster1001

pt1979@puppetmaster1001:~$ sudo puppet cert --list
Warning: `puppet cert` is deprecated and will be removed in a future release.
   (location: /usr/lib/ruby/vendor_ruby/puppet/application.rb:370:in `run')
  "es2041.codfw.wmnet" (SHA256) 8D:3A:16:9D:02:C7:9C:8E:CD:B5:13:EE:90:EF:5A:C4:BC:0---------------

what you can do is to re-run the reimage again with the --no-pxe flag and see if it will sens the request to the right puppetserver. I will delete the cert request on the puppetmaster

Cookbook cookbooks.sre.hosts.reimage was started by jhancock@cumin2002 for host es2041.codfw.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage started by jhancock@cumin2002 for host es2041.codfw.wmnet with OS bookworm executed with errors:

  • es2041 (FAIL)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Generated Puppet certificate
    • The reimage failed, see the cookbook logs for the details. You can also try typing "sudo install-console es2041.codfw.wmnet" to get a root shell, but depending on the failure this may not work.

@Jhancock.wm it did send the request again to puppetmaster1001. i will ping IF tomorrow to see what's going on. why the puppet request is going to puppetmaster1001.

puppetmaster1001:~$ sudo puppet cert --list
Warning: `puppet cert` is deprecated and will be removed in a future release.
   (location: /usr/lib/ruby/vendor_ruby/puppet/application.rb:370:in `run')
  "es2041.codfw.wmnet" (SHA256) C3:70:BE:D7:25:56:3A:0F:23:3C:20:D4:D4:E4:CD:1A:B3:44:3

Cookbook cookbooks.sre.hosts.reimage was started by jhancock@cumin2002 for host es2042.codfw.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage was started by jhancock@cumin2002 for host es2046.codfw.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage was started by jhancock@cumin2002 for host es2045.codfw.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage was started by jhancock@cumin2002 for host es2044.codfw.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage was started by jhancock@cumin2002 for host es2043.codfw.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage started by jhancock@cumin2002 for host es2042.codfw.wmnet with OS bookworm completed:

  • es2042 (PASS)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202411221835_jhancock_2995890_es2042.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> active
    • The sre.puppet.sync-netbox-hiera cookbook was run successfully

Cookbook cookbooks.sre.hosts.reimage started by jhancock@cumin2002 for host es2044.codfw.wmnet with OS bookworm completed:

  • es2044 (PASS)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202411221909_jhancock_3002604_es2044.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> active
    • The sre.puppet.sync-netbox-hiera cookbook was run successfully

Cookbook cookbooks.sre.hosts.reimage started by jhancock@cumin2002 for host es2043.codfw.wmnet with OS bookworm completed:

  • es2043 (PASS)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202411221913_jhancock_3002609_es2043.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> active
    • The sre.puppet.sync-netbox-hiera cookbook was run successfully

Cookbook cookbooks.sre.hosts.reimage started by jhancock@cumin2002 for host es2046.codfw.wmnet with OS bookworm completed:

  • es2046 (PASS)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202411221916_jhancock_3002589_es2046.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> active
    • The sre.puppet.sync-netbox-hiera cookbook was run successfully

Cookbook cookbooks.sre.hosts.reimage started by jhancock@cumin2002 for host es2045.codfw.wmnet with OS bookworm completed:

  • es2045 (PASS)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202411221919_jhancock_3002599_es2045.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> active
    • The sre.puppet.sync-netbox-hiera cookbook was run successfully

Change #1099696 had a related patch set uploaded (by Arnaudb; author: Arnaudb):

[operations/puppet@production] mariadb: add new ES hosts

https://gerrit.wikimedia.org/r/1099696

Change #1083758 abandoned by Arnaudb:

[operations/puppet@production] mariadb: add 12 new es hosts

Reason:

replaced by: https://gerrit.wikimedia.org/r/c/operations/puppet/+/1099696

https://gerrit.wikimedia.org/r/1083758

Change #1099696 merged by Arnaudb:

[operations/puppet@production] mariadb: add new ES hosts

https://gerrit.wikimedia.org/r/1099696

Cookbook cookbooks.sre.hosts.reimage was started by jhancock@cumin2002 for host db2242.codfw.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage was started by jhancock@cumin2002 for host db2241.codfw.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage started by jhancock@cumin2002 for host db2241.codfw.wmnet with OS bookworm completed:

  • db2241 (PASS)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202412031438_jhancock_1363392_db2241.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> active
    • The sre.puppet.sync-netbox-hiera cookbook was run successfully

Cookbook cookbooks.sre.hosts.reimage started by jhancock@cumin2002 for host db2242.codfw.wmnet with OS bookworm completed:

  • db2242 (WARN)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202412031435_jhancock_1363387_db2242.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> active
    • Failed to run the sre.puppet.sync-netbox-hiera cookbook, run it manually