Page MenuHomePhabricator

wikikube-worker23[32-56] implementation tracking
Closed, ResolvedPublic

Description

This task is to track the service implementation of ServiceOps new host(s) listed in the task description.

Once the linked racking task has been resolved, this task can be implemented.

This sub-task creation/update is per the request of ServiceOps new ; this task is assigned at creation to the 'Sub-team Technical Contact' provided in the initial ordering task.

1.) Extend the hostname globs as appropriate in puppet/manifests/site.pp. Remove the entries insetup.
2.) Verify and commit changes to puppet repo, review, merge etc.
3.) Run puppet on new nodes.
sudo cumin -b 25 wikikube-worker[2332-2356] run-puppet-agent
4.) Run puppet on all cluster nodes and registry
sudo cumin -b 25 -p 10 'A:wikikube-worker-codfw or A:docker-registry' run-puppet-agent
4.) Update Netbox' (remember to run homer afterwards and !log your action on #wikimedia-operations):
./add_k8s_node.py --netbox-token $NETBOX_TOKEN --netbox-commit --task-id T417772 wikikube-worker[2332-2356].codfw.wmnet
5.) Pool the new nodes:
sudo cookbook sre.k8s.pool-depool-node --k8s-cluster wikikube-codfw -t T417772 pool wikikube-worker[2332-2356]

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Change #1240301 had a related patch set uploaded (by Blake; author: Blake):

[operations/puppet@production] site.pp: wikikube-worker23[32-56] as kubernetes::worker

https://gerrit.wikimedia.org/r/1240301

Change #1240301 merged by Blake:

[operations/puppet@production] site.pp: wikikube-worker23[32-56] as kubernetes::worker

https://gerrit.wikimedia.org/r/1240301

Clement_Goubert changed the task status from Open to In Progress.Feb 18 2026, 3:13 PM
Clement_Goubert moved this task from Scheduled (this Q) to In Progress on the ServiceOps new board.
Clement_Goubert updated the task description. (Show Details)

Change #1240276 merged by Clément Goubert:

[operations/puppet@production] kubernetes: Add wikikube-worker23[32-56]

https://gerrit.wikimedia.org/r/1240276

Mentioned in SAL (#wikimedia-operations) [2026-02-18T15:40:32Z] <bjensen> homer 'cr*codfw*' commit 'T417772'

Mentioned in SAL (#wikimedia-operations) [2026-02-18T15:44:49Z] <claime> homer 'lsw*codfw*' commit 'T417772'

Change #1240323 had a related patch set uploaded (by Clément Goubert; author: Clément Goubert):

[operations/puppet@production] conftool-data: Remove wikikube-workers in codfw E/F

https://gerrit.wikimedia.org/r/1240323

Change #1240323 merged by Clément Goubert:

[operations/puppet@production] conftool-data: Remove wikikube-workers in codfw E/F

https://gerrit.wikimedia.org/r/1240323

Clement_Goubert changed the task status from In Progress to Stalled.Feb 19 2026, 2:42 PM
Clement_Goubert moved this task from In Progress to Needs Info / Blocked on the ServiceOps new board.

These hosts are currently blocked from being put in production due to them being connected to Nokia switches.
I forgot that was the case, meaning I did run homer to try and enable BGP for them, which does not currently work.
Work on resolving that will be tracked in T417817: Test Nokia switches BGP config for k8s workers

Change #1242351 had a related patch set uploaded (by Kamila Součková; author: Kamila Součková):

[operations/deployment-charts@master] admin/common-bgp: add F4 ToR switch

https://gerrit.wikimedia.org/r/1242351

Change #1242351 merged by jenkins-bot:

[operations/deployment-charts@master] admin/common-bgp: add F4 ToR switch

https://gerrit.wikimedia.org/r/1242351

E1, E3, F1, F3 are Juniper, so you should now be unblocked for those racks (I deployed the ToR switches patches). The Nokia switches for the other rows should be ready in roughly days, @ayounsi is working on that.

Raine changed the task status from Stalled to Open.Feb 25 2026, 2:16 PM
Clement_Goubert changed the task status from Open to In Progress.Mar 2 2026, 3:06 PM
Clement_Goubert moved this task from Needs Info / Blocked to In Progress on the ServiceOps new board.

This should now be unblocked.

Jo @Blake, I've added 10 trixie nodes to wikikube eqiad already (T418259). So if you feel like it you could reimage some/all of these to trixie as well before adding them.

Sounds good, I'll take a look at this tomorrow.

Before we run the reimage, Janis guided me through verifying network connectivity for these hosts.

First, ensuring that we have BGP connectivity to the TORs. We expect 1 ipv4 and 1 ipv6 session per host, and we don't expect lvs adjacency.

blake@cumin1003:~$ sudo cookbook sre.k8s.pool-depool-node --k8s-cluster wikikube-codfw check wikikube-worker[2332-2356].codfw
.wmnet                                                                                                                       
Using hosts query: 'wikikube-worker[2332-2356].codfw.wmnet'                                                                  
wikikube-worker2332.codfw.wmnet: New vlan private1-e1-codfw, need 2 Established BGP sessions                                 
Netbox info for wikikube-worker2332.codfw.wmnet: {'has_l2_lvs_adjacency': False, 'bgp_session_count': 2}                     
wikikube-worker2333.codfw.wmnet: New vlan private1-e1-codfw, need 2 Established BGP sessions                                 
Netbox info for wikikube-worker2333.codfw.wmnet: {'has_l2_lvs_adjacency': False, 'bgp_session_count': 2}       
                              
...many similar hosts, all with 2 sessions...

wikikube-worker2355.codfw.wmnet: New vlan private1-f4-codfw, need 2 Established BGP sessions
Netbox info for wikikube-worker2355.codfw.wmnet: {'has_l2_lvs_adjacency': False, 'bgp_session_count': 2}
wikikube-worker2356.codfw.wmnet: New vlan private1-f4-codfw, need 2 Established BGP sessions
Netbox info for wikikube-worker2356.codfw.wmnet: {'has_l2_lvs_adjacency': False, 'bgp_session_count': 2}

Next, verifying TCP connectivity to a pod on the host. opentelemetry-collector is scheduled whether or not hosts are cordoned, so we can use that to test.

root@deploy2002:~# kubectl -n opentelemetry-collector get po -o wide |egrep "$(nodeset -e -S\| wikikube-worker[2332-2356])" |
 awk '{print $6}' | while read ip; do nc -zv $ip 4318; done                                                                                                                            
Connection to 10.194.171.65 4318 port [tcp/*] succeeded!                                                                     
Connection to 10.194.162.170 4318 port [tcp/*] succeeded!                                                                    
Connection to 10.194.133.192 4318 port [tcp/*] succeeded!

...many similar successful tcp connections to these pods...

Connection to 10.194.231.152 4318 port [tcp/*] succeeded!
Connection to 10.194.221.64 4318 port [tcp/*] succeeded!

Ah, we were going to proceed with sudo cookbook sre.k8s.roll-reimage-nodes --k8s-cluster wikikube-codfw --query 'P{wikikube-worker[2332-2356].codfw.wmnet}' --reason T417772 --os trixie, but it turns out that this won't work at the minute. roll-reimage-nodes expects nodes to be present in conftool, and that won't be the case for these nodes.

This assumption is described here: https://gerrit.wikimedia.org/r/plugins/gitiles/operations/cookbooks/+/refs/heads/master/cookbooks/sre/k8s/roll-reimage-nodes.py#103. It's not clear to me yet what will need to change, but I'll try to fix this as a part of this task.

Proceeding with one-off reimages here so we can get these hosts repooled on Trixie.

Cookbook cookbooks.sre.hosts.reimage was started by blake@cumin1003 for host wikikube-worker2332.codfw.wmnet with OS trixie

Cookbook cookbooks.sre.hosts.reimage started by blake@cumin1003 for host wikikube-worker2332.codfw.wmnet with OS trixie completed:

  • wikikube-worker2332 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced UEFI HTTP Boot for next reboot
    • Host rebooted via Redfish
    • Host up (Debian installer)
    • Host up (new fresh trixie OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202603061043_blake_1358318_wikikube-worker2332.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage was started by blake@cumin1003 for host wikikube-worker2333.codfw.wmnet with OS trixie

Cookbook cookbooks.sre.hosts.reimage was started by blake@cumin1003 for host wikikube-worker2335.codfw.wmnet with OS trixie

Cookbook cookbooks.sre.hosts.reimage was started by blake@cumin1003 for host wikikube-worker2334.codfw.wmnet with OS trixie

Cookbook cookbooks.sre.hosts.reimage started by blake@cumin1003 for host wikikube-worker2333.codfw.wmnet with OS trixie completed:

  • wikikube-worker2333 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced UEFI HTTP Boot for next reboot
    • Host rebooted via Redfish
    • Host up (Debian installer)
    • Host up (new fresh trixie OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202603061127_blake_1364559_wikikube-worker2333.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage started by blake@cumin1003 for host wikikube-worker2334.codfw.wmnet with OS trixie completed:

  • wikikube-worker2334 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced UEFI HTTP Boot for next reboot
    • Host rebooted via Redfish
    • Host up (Debian installer)
    • Host up (new fresh trixie OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202603061130_blake_1364623_wikikube-worker2334.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage started by blake@cumin1003 for host wikikube-worker2335.codfw.wmnet with OS trixie completed:

  • wikikube-worker2335 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced UEFI HTTP Boot for next reboot
    • Host rebooted via Redfish
    • Host up (Debian installer)
    • Host up (new fresh trixie OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202603061134_blake_1364658_wikikube-worker2335.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage was started by blake@cumin1003 for host wikikube-worker2336.codfw.wmnet with OS trixie

Cookbook cookbooks.sre.hosts.reimage was started by blake@cumin1003 for host wikikube-worker2337.codfw.wmnet with OS trixie

Cookbook cookbooks.sre.hosts.reimage was started by blake@cumin1003 for host wikikube-worker2338.codfw.wmnet with OS trixie

Cookbook cookbooks.sre.hosts.reimage was started by blake@cumin1003 for host wikikube-worker2339.codfw.wmnet with OS trixie

Cookbook cookbooks.sre.hosts.reimage was started by blake@cumin1003 for host wikikube-worker2340.codfw.wmnet with OS trixie

Cookbook cookbooks.sre.hosts.reimage was started by blake@cumin1003 for host wikikube-worker2341.codfw.wmnet with OS trixie

Cookbook cookbooks.sre.hosts.reimage started by blake@cumin1003 for host wikikube-worker2339.codfw.wmnet with OS trixie completed:

  • wikikube-worker2339 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced UEFI HTTP Boot for next reboot
    • Host rebooted via Redfish
    • Host up (Debian installer)
    • Host up (new fresh trixie OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202603061218_blake_1377346_wikikube-worker2339.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage started by blake@cumin1003 for host wikikube-worker2336.codfw.wmnet with OS trixie completed:

  • wikikube-worker2336 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced UEFI HTTP Boot for next reboot
    • Host rebooted via Redfish
    • Host up (Debian installer)
    • Host up (new fresh trixie OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202603061222_blake_1377281_wikikube-worker2336.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage started by blake@cumin1003 for host wikikube-worker2338.codfw.wmnet with OS trixie completed:

  • wikikube-worker2338 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced UEFI HTTP Boot for next reboot
    • Host rebooted via Redfish
    • Host up (Debian installer)
    • Host up (new fresh trixie OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202603061226_blake_1377337_wikikube-worker2338.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage started by blake@cumin1003 for host wikikube-worker2337.codfw.wmnet with OS trixie completed:

  • wikikube-worker2337 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced UEFI HTTP Boot for next reboot
    • Host rebooted via Redfish
    • Host up (Debian installer)
    • Host up (new fresh trixie OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202603061231_blake_1377328_wikikube-worker2337.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage started by blake@cumin1003 for host wikikube-worker2340.codfw.wmnet with OS trixie completed:

  • wikikube-worker2340 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced UEFI HTTP Boot for next reboot
    • Host rebooted via Redfish
    • Host up (Debian installer)
    • Host up (new fresh trixie OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202603061234_blake_1377351_wikikube-worker2340.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage started by blake@cumin1003 for host wikikube-worker2341.codfw.wmnet with OS trixie completed:

  • wikikube-worker2341 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced UEFI HTTP Boot for next reboot
    • Host rebooted via Redfish
    • Host up (Debian installer)
    • Host up (new fresh trixie OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202603061240_blake_1377356_wikikube-worker2341.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage was started by blake@cumin1003 for host wikikube-worker2342.codfw.wmnet with OS trixie

Cookbook cookbooks.sre.hosts.reimage was started by blake@cumin1003 for host wikikube-worker2343.codfw.wmnet with OS trixie

Cookbook cookbooks.sre.hosts.reimage was started by blake@cumin1003 for host wikikube-worker2344.codfw.wmnet with OS trixie

Cookbook cookbooks.sre.hosts.reimage was started by blake@cumin1003 for host wikikube-worker2345.codfw.wmnet with OS trixie

Cookbook cookbooks.sre.hosts.reimage was started by blake@cumin1003 for host wikikube-worker2346.codfw.wmnet with OS trixie

Cookbook cookbooks.sre.hosts.reimage was started by blake@cumin1003 for host wikikube-worker2347.codfw.wmnet with OS trixie

Cookbook cookbooks.sre.hosts.reimage started by blake@cumin1003 for host wikikube-worker2342.codfw.wmnet with OS trixie completed:

  • wikikube-worker2342 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced UEFI HTTP Boot for next reboot
    • Host rebooted via Redfish
    • Host up (Debian installer)
    • Host up (new fresh trixie OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202603061324_blake_1395750_wikikube-worker2342.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage started by blake@cumin1003 for host wikikube-worker2344.codfw.wmnet with OS trixie completed:

  • wikikube-worker2344 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced UEFI HTTP Boot for next reboot
    • Host rebooted via Redfish
    • Host up (Debian installer)
    • Host up (new fresh trixie OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202603061328_blake_1395784_wikikube-worker2344.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage started by blake@cumin1003 for host wikikube-worker2345.codfw.wmnet with OS trixie completed:

  • wikikube-worker2345 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced UEFI HTTP Boot for next reboot
    • Host rebooted via Redfish
    • Host up (Debian installer)
    • Host up (new fresh trixie OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202603061331_blake_1395807_wikikube-worker2345.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage started by blake@cumin1003 for host wikikube-worker2343.codfw.wmnet with OS trixie completed:

  • wikikube-worker2343 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced UEFI HTTP Boot for next reboot
    • Host rebooted via Redfish
    • Host up (Debian installer)
    • Host up (new fresh trixie OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202603061335_blake_1395768_wikikube-worker2343.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage started by blake@cumin1003 for host wikikube-worker2346.codfw.wmnet with OS trixie completed:

  • wikikube-worker2346 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced UEFI HTTP Boot for next reboot
    • Host rebooted via Redfish
    • Host up (Debian installer)
    • Host up (new fresh trixie OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202603061338_blake_1395858_wikikube-worker2346.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage started by blake@cumin1003 for host wikikube-worker2347.codfw.wmnet with OS trixie completed:

  • wikikube-worker2347 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced UEFI HTTP Boot for next reboot
    • Host rebooted via Redfish
    • Host up (Debian installer)
    • Host up (new fresh trixie OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202603061342_blake_1395886_wikikube-worker2347.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage was started by blake@cumin1003 for host wikikube-worker2348.codfw.wmnet with OS trixie

Cookbook cookbooks.sre.hosts.reimage started by blake@cumin1003 for host wikikube-worker2348.codfw.wmnet with OS trixie executed with errors:

  • wikikube-worker2348 (FAIL)
    • The reimage failed, see the cookbook logs for the details. You can also try typing "sudo install-console wikikube-worker2348.codfw.wmnet" to get a root shell, but depending on the failure this may not work.

Cookbook cookbooks.sre.hosts.reimage was started by blake@cumin1003 for host wikikube-worker2349.codfw.wmnet with OS trixie

Cookbook cookbooks.sre.hosts.reimage started by blake@cumin1003 for host wikikube-worker2349.codfw.wmnet with OS trixie executed with errors:

  • wikikube-worker2349 (FAIL)
    • The reimage failed, see the cookbook logs for the details. You can also try typing "sudo install-console wikikube-worker2349.codfw.wmnet" to get a root shell, but depending on the failure this may not work.

Cookbook cookbooks.sre.hosts.reimage was started by blake@cumin1003 for host wikikube-worker2348.codfw.wmnet with OS trixie

Cookbook cookbooks.sre.hosts.reimage was started by blake@cumin1003 for host wikikube-worker2349.codfw.wmnet with OS trixie

Cookbook cookbooks.sre.hosts.reimage was started by blake@cumin1003 for host wikikube-worker2350.codfw.wmnet with OS trixie

Cookbook cookbooks.sre.hosts.reimage was started by blake@cumin1003 for host wikikube-worker2351.codfw.wmnet with OS trixie

Cookbook cookbooks.sre.hosts.reimage was started by blake@cumin1003 for host wikikube-worker2352.codfw.wmnet with OS trixie

Cookbook cookbooks.sre.hosts.reimage was started by blake@cumin1003 for host wikikube-worker2353.codfw.wmnet with OS trixie

Cookbook cookbooks.sre.hosts.reimage started by blake@cumin1003 for host wikikube-worker2348.codfw.wmnet with OS trixie completed:

  • wikikube-worker2348 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced UEFI HTTP Boot for next reboot
    • Host rebooted via Redfish
    • Host up (Debian installer)
    • Host up (new fresh trixie OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202603061423_blake_1412296_wikikube-worker2348.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage started by blake@cumin1003 for host wikikube-worker2350.codfw.wmnet with OS trixie completed:

  • wikikube-worker2350 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced UEFI HTTP Boot for next reboot
    • Host rebooted via Redfish
    • Host up (Debian installer)
    • Host up (new fresh trixie OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202603061426_blake_1412377_wikikube-worker2350.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage started by blake@cumin1003 for host wikikube-worker2352.codfw.wmnet with OS trixie completed:

  • wikikube-worker2352 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced UEFI HTTP Boot for next reboot
    • Host rebooted via Redfish
    • Host up (Debian installer)
    • Host up (new fresh trixie OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202603061430_blake_1412423_wikikube-worker2352.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage started by blake@cumin1003 for host wikikube-worker2351.codfw.wmnet with OS trixie completed:

  • wikikube-worker2351 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced UEFI HTTP Boot for next reboot
    • Host rebooted via Redfish
    • Host up (Debian installer)
    • Host up (new fresh trixie OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202603061433_blake_1412393_wikikube-worker2351.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage started by blake@cumin1003 for host wikikube-worker2349.codfw.wmnet with OS trixie completed:

  • wikikube-worker2349 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced UEFI HTTP Boot for next reboot
    • Host rebooted via Redfish
    • Host up (Debian installer)
    • Host up (new fresh trixie OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202603061437_blake_1412315_wikikube-worker2349.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage started by blake@cumin1003 for host wikikube-worker2353.codfw.wmnet with OS trixie completed:

  • wikikube-worker2353 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced UEFI HTTP Boot for next reboot
    • Host rebooted via Redfish
    • Host up (Debian installer)
    • Host up (new fresh trixie OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202603061441_blake_1412447_wikikube-worker2353.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage was started by blake@cumin1003 for host wikikube-worker2354.codfw.wmnet with OS trixie

Cookbook cookbooks.sre.hosts.reimage was started by blake@cumin1003 for host wikikube-worker2355.codfw.wmnet with OS trixie

Cookbook cookbooks.sre.hosts.reimage was started by blake@cumin1003 for host wikikube-worker2356.codfw.wmnet with OS trixie

Cookbook cookbooks.sre.hosts.reimage started by blake@cumin1003 for host wikikube-worker2354.codfw.wmnet with OS trixie completed:

  • wikikube-worker2354 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced UEFI HTTP Boot for next reboot
    • Host rebooted via Redfish
    • Host up (Debian installer)
    • Host up (new fresh trixie OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202603061523_blake_1428398_wikikube-worker2354.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage started by blake@cumin1003 for host wikikube-worker2355.codfw.wmnet with OS trixie completed:

  • wikikube-worker2355 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced UEFI HTTP Boot for next reboot
    • Host rebooted via Redfish
    • Host up (Debian installer)
    • Host up (new fresh trixie OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202603061526_blake_1428406_wikikube-worker2355.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage started by blake@cumin1003 for host wikikube-worker2356.codfw.wmnet with OS trixie completed:

  • wikikube-worker2356 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced UEFI HTTP Boot for next reboot
    • Host rebooted via Redfish
    • Host up (Debian installer)
    • Host up (new fresh trixie OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202603061531_blake_1428433_wikikube-worker2356.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Alright, these have all been imaged with Trixie, and have been pooled.

wikikube-worker2332.codfw.wmnet k8s status: schedulable
wikikube-worker2333.codfw.wmnet k8s status: schedulable
wikikube-worker2334.codfw.wmnet k8s status: schedulable
wikikube-worker2335.codfw.wmnet k8s status: schedulable
wikikube-worker2336.codfw.wmnet k8s status: schedulable
wikikube-worker2337.codfw.wmnet k8s status: schedulable
wikikube-worker2338.codfw.wmnet k8s status: schedulable
wikikube-worker2339.codfw.wmnet k8s status: schedulable
wikikube-worker2340.codfw.wmnet k8s status: schedulable
wikikube-worker2341.codfw.wmnet k8s status: schedulable
wikikube-worker2342.codfw.wmnet k8s status: schedulable
wikikube-worker2343.codfw.wmnet k8s status: schedulable
wikikube-worker2344.codfw.wmnet k8s status: schedulable
wikikube-worker2345.codfw.wmnet k8s status: schedulable
wikikube-worker2346.codfw.wmnet k8s status: schedulable
wikikube-worker2347.codfw.wmnet k8s status: schedulable
wikikube-worker2348.codfw.wmnet k8s status: schedulable
wikikube-worker2349.codfw.wmnet k8s status: schedulable
wikikube-worker2350.codfw.wmnet k8s status: schedulable
wikikube-worker2351.codfw.wmnet k8s status: schedulable
wikikube-worker2352.codfw.wmnet k8s status: schedulable
wikikube-worker2353.codfw.wmnet k8s status: schedulable
wikikube-worker2354.codfw.wmnet k8s status: schedulable
wikikube-worker2355.codfw.wmnet k8s status: schedulable
wikikube-worker2356.codfw.wmnet k8s status: schedulable