Page MenuHomePhabricator

Q2:rack/setup/install wikikube-worker refresh
Closed, ResolvedPublic

Description

This task will track the racking, setup, and OS installation of wikikube-worker refresh

Hostname / Racking / Installation Details

This section should list the racking restrictions for these hosts. If they shouldn't share a rack/row with one another, or any existing hosts. This section should also list the other details listed below.

Hostnames: What are the hostnames, and have you updated https://wikitech.wikimedia.org/wiki/Infrastructure_naming_conventions ? wikikube-worker13[60-72]
Racking Proposal: Where should these systems be racked? Can they share with any existing systems or should they avoid any other systems sharing their rack or row? Use https://fault-tolerance.toolforge.org/map to optimize this placement.

  • Row A: 0
  • Row B: 2
  • Row C: 3
  • Row D: 6
  • Row E: 1
  • Row F: 1

Networking Setup: # of Connections:1/2 - Speed:1G/10G. - VLAN:Private/Public/Other(Specify) :
OS Distro: Bookworm (default unless otherwise specified)
Boot Method: Legacy BIOS or UEFI. Please note UEFI must have partman updates applied in advance of setup and is currently in pilot program: https://wikitech.wikimedia.org/wiki/UEFI_Boot
Sub-team Technical Contact: @Clement_Goubert

Per host setup checklist

Each host should have its own setup checklist copied and pasted into the list below.

wikikube-worker1360
  • Receive in system on procurement task T405289 & in Coupa
  • Rack system with proposed racking plan (see above) & update Netbox (include all system info plus location, state of planned)
  • Run the Provision a server's network attributes Netbox script - Note that you must run the DNS and Provision cookbook after completing this step
  • Immediately run the sre.dns.netbox cookbook
  • Immediately run the sre.hosts.provision cookbook
  • Run the sre.hardware.upgrade-firmware cookbook
  • Update the operations/puppet repo - this should include updates to preseed.yaml, and site.pp with roles defined by service group: https://wikitech.wikimedia.org/wiki/SRE/Dc-operations
  • Run the sre.hosts.reimage cookbook
wikikube-worker1361
  • Receive in system on procurement task T405289 & in Coupa
  • Rack system with proposed racking plan (see above) & update Netbox (include all system info plus location, state of planned)
  • Run the Provision a server's network attributes Netbox script - Note that you must run the DNS and Provision cookbook after completing this step
  • Immediately run the sre.dns.netbox cookbook
  • Immediately run the sre.hosts.provision cookbook
  • Run the sre.hardware.upgrade-firmware cookbook
  • Update the operations/puppet repo - this should include updates to preseed.yaml, and site.pp with roles defined by service group: https://wikitech.wikimedia.org/wiki/SRE/Dc-operations
  • Run the sre.hosts.reimage cookbook
wikikube-worker1362
  • Receive in system on procurement task T405289 & in Coupa
  • Rack system with proposed racking plan (see above) & update Netbox (include all system info plus location, state of planned)
  • Run the Provision a server's network attributes Netbox script - Note that you must run the DNS and Provision cookbook after completing this step
  • Immediately run the sre.dns.netbox cookbook
  • Immediately run the sre.hosts.provision cookbook
  • Run the sre.hardware.upgrade-firmware cookbook
  • Update the operations/puppet repo - this should include updates to preseed.yaml, and site.pp with roles defined by service group: https://wikitech.wikimedia.org/wiki/SRE/Dc-operations
  • Run the sre.hosts.reimage cookbook
wikikube-worker1363
  • Receive in system on procurement task T405289 & in Coupa
  • Rack system with proposed racking plan (see above) & update Netbox (include all system info plus location, state of planned)
  • Run the Provision a server's network attributes Netbox script - Note that you must run the DNS and Provision cookbook after completing this step
  • Immediately run the sre.dns.netbox cookbook
  • Immediately run the sre.hosts.provision cookbook
  • Run the sre.hardware.upgrade-firmware cookbook
  • Update the operations/puppet repo - this should include updates to preseed.yaml, and site.pp with roles defined by service group: https://wikitech.wikimedia.org/wiki/SRE/Dc-operations
  • Run the sre.hosts.reimage cookbook
wikikube-worker1364
  • Receive in system on procurement task T405289 & in Coupa
  • Rack system with proposed racking plan (see above) & update Netbox (include all system info plus location, state of planned)
  • Run the Provision a server's network attributes Netbox script - Note that you must run the DNS and Provision cookbook after completing this step
  • Immediately run the sre.dns.netbox cookbook
  • Immediately run the sre.hosts.provision cookbook
  • Run the sre.hardware.upgrade-firmware cookbook
  • Update the operations/puppet repo - this should include updates to preseed.yaml, and site.pp with roles defined by service group: https://wikitech.wikimedia.org/wiki/SRE/Dc-operations
  • Run the sre.hosts.reimage cookbook
wikikube-worker1365
  • Receive in system on procurement task T405289 & in Coupa
  • Rack system with proposed racking plan (see above) & update Netbox (include all system info plus location, state of planned)
  • Run the Provision a server's network attributes Netbox script - Note that you must run the DNS and Provision cookbook after completing this step
  • Immediately run the sre.dns.netbox cookbook
  • Immediately run the sre.hosts.provision cookbook
  • Run the sre.hardware.upgrade-firmware cookbook
  • Update the operations/puppet repo - this should include updates to preseed.yaml, and site.pp with roles defined by service group: https://wikitech.wikimedia.org/wiki/SRE/Dc-operations
  • Run the sre.hosts.reimage cookbook
wikikube-worker1366
  • Receive in system on procurement task T405289 & in Coupa
  • Rack system with proposed racking plan (see above) & update Netbox (include all system info plus location, state of planned)
  • Run the Provision a server's network attributes Netbox script - Note that you must run the DNS and Provision cookbook after completing this step
  • Immediately run the sre.dns.netbox cookbook
  • Immediately run the sre.hosts.provision cookbook
  • Run the sre.hardware.upgrade-firmware cookbook
  • Update the operations/puppet repo - this should include updates to preseed.yaml, and site.pp with roles defined by service group: https://wikitech.wikimedia.org/wiki/SRE/Dc-operations
  • Run the sre.hosts.reimage cookbook
wikikube-worker1367
  • Receive in system on procurement task T405289 & in Coupa
  • Rack system with proposed racking plan (see above) & update Netbox (include all system info plus location, state of planned)
  • Run the Provision a server's network attributes Netbox script - Note that you must run the DNS and Provision cookbook after completing this step
  • Immediately run the sre.dns.netbox cookbook
  • Immediately run the sre.hosts.provision cookbook
  • Run the sre.hardware.upgrade-firmware cookbook
  • Update the operations/puppet repo - this should include updates to preseed.yaml, and site.pp with roles defined by service group: https://wikitech.wikimedia.org/wiki/SRE/Dc-operations
  • Run the sre.hosts.reimage cookbook
wikikube-worker1368
  • Receive in system on procurement task T405289 & in Coupa
  • Rack system with proposed racking plan (see above) & update Netbox (include all system info plus location, state of planned)
  • Run the Provision a server's network attributes Netbox script - Note that you must run the DNS and Provision cookbook after completing this step
  • Immediately run the sre.dns.netbox cookbook
  • Immediately run the sre.hosts.provision cookbook
  • Run the sre.hardware.upgrade-firmware cookbook
  • Update the operations/puppet repo - this should include updates to preseed.yaml, and site.pp with roles defined by service group: https://wikitech.wikimedia.org/wiki/SRE/Dc-operations
  • Run the sre.hosts.reimage cookbook
wikikube-worker1369
  • Receive in system on procurement task T405289 & in Coupa
  • Rack system with proposed racking plan (see above) & update Netbox (include all system info plus location, state of planned)
  • Run the Provision a server's network attributes Netbox script - Note that you must run the DNS and Provision cookbook after completing this step
  • Immediately run the sre.dns.netbox cookbook
  • Immediately run the sre.hosts.provision cookbook
  • Run the sre.hardware.upgrade-firmware cookbook
  • Update the operations/puppet repo - this should include updates to preseed.yaml, and site.pp with roles defined by service group: https://wikitech.wikimedia.org/wiki/SRE/Dc-operations
  • Run the sre.hosts.reimage cookbook
wikikube-worker1370
  • Receive in system on procurement task T405289 & in Coupa
  • Rack system with proposed racking plan (see above) & update Netbox (include all system info plus location, state of planned)
  • Run the Provision a server's network attributes Netbox script - Note that you must run the DNS and Provision cookbook after completing this step
  • Immediately run the sre.dns.netbox cookbook
  • Immediately run the sre.hosts.provision cookbook
  • Run the sre.hardware.upgrade-firmware cookbook
  • Update the operations/puppet repo - this should include updates to preseed.yaml, and site.pp with roles defined by service group: https://wikitech.wikimedia.org/wiki/SRE/Dc-operations
  • Run the sre.hosts.reimage cookbook
wikikube-worker1371
  • Receive in system on procurement task T405289 & in Coupa
  • Rack system with proposed racking plan (see above) & update Netbox (include all system info plus location, state of planned)
  • Run the Provision a server's network attributes Netbox script - Note that you must run the DNS and Provision cookbook after completing this step
  • Immediately run the sre.dns.netbox cookbook
  • Immediately run the sre.hosts.provision cookbook
  • Run the sre.hardware.upgrade-firmware cookbook
  • Update the operations/puppet repo - this should include updates to preseed.yaml, and site.pp with roles defined by service group: https://wikitech.wikimedia.org/wiki/SRE/Dc-operations
  • Run the sre.hosts.reimage cookbook
wikikube-worker1372
  • Receive in system on procurement task T405289 & in Coupa
  • Rack system with proposed racking plan (see above) & update Netbox (include all system info plus location, state of planned)
  • Run the Provision a server's network attributes Netbox script - Note that you must run the DNS and Provision cookbook after completing this step
  • Immediately run the sre.dns.netbox cookbook
  • Immediately run the sre.hosts.provision cookbook
  • Run the sre.hardware.upgrade-firmware cookbook
  • Update the operations/puppet repo - this should include updates to preseed.yaml, and site.pp with roles defined by service group: https://wikitech.wikimedia.org/wiki/SRE/Dc-operations
  • Run the sre.hosts.reimage cookbook

Event Timeline

Jhancock.wm mentioned this in Unknown Object (Task).

Change #1200116 had a related patch set uploaded (by Clément Goubert; author: Clément Goubert):

[operations/puppet@production] site.pp: Add new wikikube insetup hosts

https://gerrit.wikimedia.org/r/1200116

Change #1200116 merged by Clément Goubert:

[operations/puppet@production] site.pp: Add new wikikube insetup hosts

https://gerrit.wikimedia.org/r/1200116

We currently are out of space in row A for 8 - 10 Gig connections due to power. I have notified @Clement_Goubert to see if we might be able to move them into another row.

Thanks for the heads up @VRiley-WMF.

I've tried to plan out how the balance of servers per row would look like after the refreshes in this spreadsheet (Repartition tab)

It so happens that this row is the one that gets the most hosts unracked in the two Q2 refreshes (a total of 19), which would leave it pretty underprovisioned compared to the other rows if we don't rack anything there.

We'll try and sync up tomorrow to devise a plan of attack, but maybe we could decom some servers from that row in advance (given it's currently overprovisioned) so they can be replaced 1-1 ?

Updated racking plan to:

  • Row A: 0
  • Row B: 2
  • Row C: 3
  • Row D: 6
  • Row E: 1
  • Row F: 1

This would still leave us with A slightly underprovisioned, but still in a better place than what we have now, so I'm ok with it. We'll see with future refreshes to flesh it back up a little.

Hi @VRiley-WMF just to be aware please try to spread these as much as is practical evenly across the racks in each row.

The "row-wide" view is sort of legacy but it used to be significant which is why the team focuses on it. Realistically our failure domain is a single rack, so for instance in row D it would not be good if all 6 new hosts went into the same rack. Thanks <3

wikikube-worker1360
B2
U18

wikikube-worker1361
B4
U36

wikikube-worker1362
C3
U37

wikikube-worker1363
C4
U28

wikikube-worker1364
C5
U31

wikikube-worker1365
D1
U30

wikikube-worker1366
D2
U22

wikikube-worker1367
D3
U31

wikikube-worker1368
D4
U28

wikikube-worker1369
D6
U27

wikikube-worker1370
D7
32

wikikube-worker1371
E8
U32

wikikube-worker1372
F8
U18

Change #1220370 had a related patch set uploaded (by Clément Goubert; author: Clément Goubert):

[operations/puppet@production] partman: New wikikube-worker need UEFI

https://gerrit.wikimedia.org/r/1220370

Change #1220370 merged by Clément Goubert:

[operations/puppet@production] partman: New wikikube-worker need UEFI

https://gerrit.wikimedia.org/r/1220370

wikikube-worker1360
B2
U18
CableID 5003
Port 27

wikikube-worker1361
B4
U36
CableID 5369
Port 47

wikikube-worker1362
C3
U37
CableID 230304500071
Port 31

wikikube-worker1363
C4
U28
CableID 5244
Port 19

wikikube-worker1364
C5
U31
CableID 230304500074
Port 2

wikikube-worker1365
D1
U29
CableID 230304500210
Port 12

wikikube-worker1366
D2
U22
CableID 230304500129
Port 2

wikikube-worker1367
D3
U31
CableID 230304500131
Port 4

wikikube-worker1368
D4
U28
CableID 230304500163
Port 24

wikikube-worker1369
S497720X5B34986
D6
U27
CableID 230304500162
Port 4

wikikube-worker1370
D7
32
CableID 5332
Port 1

wikikube-worker1371
E8
U32
CableID 230304500167
Port 5

wikikube-worker1372
F8
U18
CableID 230304500164
Port 23

Cookbook cookbooks.sre.hosts.reimage was started by vriley@cumin1003 for host wikikube-worker1362.eqiad.wmnet with OS trixie

Cookbook cookbooks.sre.hosts.reimage started by vriley@cumin1003 for host wikikube-worker1362.eqiad.wmnet with OS trixie completed:

  • wikikube-worker1362 (PASS)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced UEFI HTTP Boot for next reboot
    • Host rebooted via Redfish
    • Host up (Debian installer)
    • Add puppet_version metadata (7) to Debian installer
    • Host up (new fresh trixie OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202512230749_vriley_4140314_wikikube-worker1362.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> active
    • The sre.puppet.sync-netbox-hiera cookbook was run successfully

Cookbook cookbooks.sre.hosts.reimage was started by vriley@cumin1003 for host wikikube-worker1363.eqiad.wmnet with OS trixie

Cookbook cookbooks.sre.hosts.reimage started by vriley@cumin1003 for host wikikube-worker1363.eqiad.wmnet with OS trixie executed with errors:

  • wikikube-worker1363 (FAIL)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced UEFI HTTP Boot for next reboot
    • Host rebooted via Redfish
    • The reimage failed, see the cookbook logs for the details. You can also try typing "sudo install-console wikikube-worker1363.eqiad.wmnet" to get a root shell, but depending on the failure this may not work.

Cookbook cookbooks.sre.hosts.reimage was started by vriley@cumin1003 for host wikikube-worker1363.eqiad.wmnet with OS trixie

Cookbook cookbooks.sre.hosts.reimage started by vriley@cumin1003 for host wikikube-worker1363.eqiad.wmnet with OS trixie executed with errors:

  • wikikube-worker1363 (FAIL)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced UEFI HTTP Boot for next reboot
    • Host rebooted via Redfish
    • The reimage failed, see the cookbook logs for the details. You can also try typing "sudo install-console wikikube-worker1363.eqiad.wmnet" to get a root shell, but depending on the failure this may not work.

Cookbook cookbooks.sre.hosts.reimage was started by vriley@cumin1003 for host wikikube-worker1364.eqiad.wmnet with OS trixie

Cookbook cookbooks.sre.hosts.reimage started by vriley@cumin1003 for host wikikube-worker1364.eqiad.wmnet with OS trixie completed:

  • wikikube-worker1364 (PASS)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced UEFI HTTP Boot for next reboot
    • Host rebooted via Redfish
    • Host up (Debian installer)
    • Add puppet_version metadata (7) to Debian installer
    • Host up (new fresh trixie OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202512231408_vriley_46138_wikikube-worker1364.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> active
    • The sre.puppet.sync-netbox-hiera cookbook was run successfully

wikikube-worker1371
E8
U32
CableID 230304500167
Port 5

wikikube-worker1372
F8
U18
CableID 230304500164
Port 23

These two where setting alerts for being powered on connected to network without being setup i removed power. wikikube-worker1371
Is racked in u23 not 32 btw

Cookbook cookbooks.sre.hosts.reimage was started by vriley@cumin1003 for host wikikube-worker1366.eqiad.wmnet with OS trixie

Cookbook cookbooks.sre.hosts.reimage was started by vriley@cumin1003 for host wikikube-worker1367.eqiad.wmnet with OS trixie

Cookbook cookbooks.sre.hosts.reimage started by vriley@cumin1003 for host wikikube-worker1366.eqiad.wmnet with OS trixie completed:

  • wikikube-worker1366 (PASS)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced UEFI HTTP Boot for next reboot
    • Host rebooted via Redfish
    • Host up (Debian installer)
    • Host up (new fresh trixie OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202601121909_vriley_1097509_wikikube-worker1366.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> active
    • The sre.puppet.sync-netbox-hiera cookbook was run successfully

Cookbook cookbooks.sre.hosts.reimage started by vriley@cumin1003 for host wikikube-worker1366.eqiad.wmnet with OS trixie executed with errors:

  • wikikube-worker1366 (FAIL)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced UEFI HTTP Boot for next reboot
    • Host rebooted via Redfish
    • Host up (Debian installer)
    • Host up (new fresh trixie OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202601121909_vriley_1097509_wikikube-worker1366.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> active
    • The sre.puppet.sync-netbox-hiera cookbook was run successfully
    • The reimage failed, see the cookbook logs for the details. You can also try typing "sudo install-console wikikube-worker1366.eqiad.wmnet" to get a root shell, but depending on the failure this may not work.

Cookbook cookbooks.sre.hosts.reimage started by vriley@cumin1003 for host wikikube-worker1367.eqiad.wmnet with OS trixie completed:

  • wikikube-worker1367 (PASS)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced UEFI HTTP Boot for next reboot
    • Host rebooted via Redfish
    • Host up (Debian installer)
    • Host up (new fresh trixie OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202601121929_vriley_1099119_wikikube-worker1367.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> active
    • The sre.puppet.sync-netbox-hiera cookbook was run successfully

Cookbook cookbooks.sre.hosts.reimage was started by vriley@cumin1003 for host wikikube-worker1368.eqiad.wmnet with OS trixie

Cookbook cookbooks.sre.hosts.reimage was started by vriley@cumin1003 for host wikikube-worker1369.eqiad.wmnet with OS trixie

Cookbook cookbooks.sre.hosts.reimage started by vriley@cumin1003 for host wikikube-worker1368.eqiad.wmnet with OS trixie completed:

  • wikikube-worker1368 (PASS)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced UEFI HTTP Boot for next reboot
    • Host rebooted via Redfish
    • Host up (Debian installer)
    • Host up (new fresh trixie OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202601122004_vriley_1107184_wikikube-worker1368.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> active
    • The sre.puppet.sync-netbox-hiera cookbook was run successfully

Cookbook cookbooks.sre.hosts.reimage started by vriley@cumin1003 for host wikikube-worker1369.eqiad.wmnet with OS trixie completed:

  • wikikube-worker1369 (PASS)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced UEFI HTTP Boot for next reboot
    • Host rebooted via Redfish
    • Host up (Debian installer)
    • Host up (new fresh trixie OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202601122025_vriley_1109879_wikikube-worker1369.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> active
    • The sre.puppet.sync-netbox-hiera cookbook was run successfully

Cookbook cookbooks.sre.hosts.reimage was started by vriley@cumin1003 for host wikikube-worker1363.eqiad.wmnet with OS trixie

Cookbook cookbooks.sre.hosts.reimage was started by vriley@cumin1003 for host wikikube-worker1371.eqiad.wmnet with OS trixie

Cookbook cookbooks.sre.hosts.reimage started by vriley@cumin1003 for host wikikube-worker1363.eqiad.wmnet with OS trixie completed:

  • wikikube-worker1363 (PASS)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced UEFI HTTP Boot for next reboot
    • Host rebooted via Redfish
    • Host up (Debian installer)
    • Host up (new fresh trixie OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202601132228_vriley_1363977_wikikube-worker1363.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> active
    • The sre.puppet.sync-netbox-hiera cookbook was run successfully

Cookbook cookbooks.sre.hosts.reimage was started by vriley@cumin1003 for host wikikube-worker1365.eqiad.wmnet with OS trixie

Cookbook cookbooks.sre.hosts.reimage was started by vriley@cumin1003 for host wikikube-worker1366.eqiad.wmnet with OS trixie

Cookbook cookbooks.sre.hosts.reimage started by vriley@cumin1003 for host wikikube-worker1371.eqiad.wmnet with OS trixie completed:

  • wikikube-worker1371 (PASS)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced UEFI HTTP Boot for next reboot
    • Host rebooted via Redfish
    • Host up (Debian installer)
    • Host up (new fresh trixie OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202601132244_vriley_1366793_wikikube-worker1371.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> active
    • The sre.puppet.sync-netbox-hiera cookbook was run successfully

Cookbook cookbooks.sre.hosts.reimage started by vriley@cumin1003 for host wikikube-worker1365.eqiad.wmnet with OS trixie completed:

  • wikikube-worker1365 (PASS)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced UEFI HTTP Boot for next reboot
    • Host rebooted via Redfish
    • Host up (Debian installer)
    • Host up (new fresh trixie OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202601132303_vriley_1369697_wikikube-worker1365.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> active
    • The sre.puppet.sync-netbox-hiera cookbook was run successfully

Cookbook cookbooks.sre.hosts.reimage started by vriley@cumin1003 for host wikikube-worker1366.eqiad.wmnet with OS trixie completed:

  • wikikube-worker1366 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced UEFI HTTP Boot for next reboot
    • Host rebooted via Redfish
    • Host up (Debian installer)
    • Host up (new fresh trixie OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202601132308_vriley_1372250_wikikube-worker1366.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage was started by vriley@cumin1003 for host wikikube-worker1372.eqiad.wmnet with OS trixie

Cookbook cookbooks.sre.hosts.reimage was started by vriley@cumin1003 for host wikikube-worker1370.eqiad.wmnet with OS trixie

Cookbook cookbooks.sre.hosts.reimage started by vriley@cumin1003 for host wikikube-worker1372.eqiad.wmnet with OS trixie completed:

  • wikikube-worker1372 (PASS)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced UEFI HTTP Boot for next reboot
    • Host rebooted via Redfish
    • Host up (Debian installer)
    • Host up (new fresh trixie OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202601132358_vriley_1381994_wikikube-worker1372.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> active
    • The sre.puppet.sync-netbox-hiera cookbook was run successfully

Cookbook cookbooks.sre.hosts.reimage started by vriley@cumin1003 for host wikikube-worker1370.eqiad.wmnet with OS trixie completed:

  • wikikube-worker1370 (PASS)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced UEFI HTTP Boot for next reboot
    • Host rebooted via Redfish
    • Host up (Debian installer)
    • Host up (new fresh trixie OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202601140024_vriley_1389108_wikikube-worker1370.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> active
    • The sre.puppet.sync-netbox-hiera cookbook was run successfully

This has been completed