Page MenuHomePhabricator

wikikube-worker13[35-59] implementation tracking
Closed, ResolvedPublic

Description

This task is to track the service implementation of ServiceOps new host(s) listed in the task description.

Once the linked racking task has been resolved, this task can be implemented.

  • wikikube-worker[1335-1349].eqiad.wmnet (was blocked by T411054: Nokia SR-Linux DHCP Relay Bug)
    • trixie (14): wikikube-worker[1335-1346,1348-1349].eqiad.wmnet
    • bookworm trixie (1): wikikube-worker1347.eqiad.wmnet
  • wikikube-worker[1350-1359].eqiad.wmnet good to go
    • trixie (2): wikikube-worker[1350-1351].eqiad.wmnet
    • bookworm trixie (8): wikikube-worker[1352-1359].eqiad.wmnet

TODO: Add add_k8s_nodes.py output as checklist

Event Timeline

Clement_Goubert moved this task from Inbox to Scheduled (this Q) on the ServiceOps new board.
JMeybohm moved this task from Scheduled (this Q) to In Progress on the ServiceOps new board.
JMeybohm subscribed.

I'll take these to start with trixie in prod

Change #1247099 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[operations/puppet@production] conftool-data: Fix YAML syntax

https://gerrit.wikimedia.org/r/1247099

Change #1247100 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[operations/puppet@production] Add wikikube-worker[1350-1351]

https://gerrit.wikimedia.org/r/1247100

Change #1247099 merged by JMeybohm:

[operations/puppet@production] conftool-data: Fix YAML syntax

https://gerrit.wikimedia.org/r/1247099

Change #1247100 merged by JMeybohm:

[operations/puppet@production] Add wikikube-worker[1350-1351]

https://gerrit.wikimedia.org/r/1247100

Change #1247552 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[operations/deployment-charts@master] BGPPeers: Add missing lsw1-f8-eqiad

https://gerrit.wikimedia.org/r/1247552

Change #1247552 merged by jenkins-bot:

[operations/deployment-charts@master] BGPPeers: Add missing lsw1-f8-eqiad

https://gerrit.wikimedia.org/r/1247552

Change #1247561 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[operations/deployment-charts@master] BGPPeers: Add comment for eqiad E4

https://gerrit.wikimedia.org/r/1247561

Cookbook cookbooks.sre.k8s.pool-depool-node started by jayme@cumin1003 pool for host wikikube-worker[1350-1351].eqiad.wmnet completed:

  • wikikube-worker[1350-1351].eqiad.wmnet (PASS)
    • Host wikikube-worker[1350-1351].eqiad.wmnet pooled in wikikube-eqiad

Change #1247561 merged by jenkins-bot:

[operations/deployment-charts@master] BGPPeers: Add comment for eqiad E4

https://gerrit.wikimedia.org/r/1247561

Cookbook cookbooks.sre.hosts.reimage was started by jayme@cumin1003 for host wikikube-worker1352.eqiad.wmnet with OS trixie

Cookbook cookbooks.sre.hosts.reimage was started by jayme@cumin1003 for host wikikube-worker1353.eqiad.wmnet with OS trixie

Cookbook cookbooks.sre.hosts.reimage was started by jayme@cumin1003 for host wikikube-worker1354.eqiad.wmnet with OS trixie

Cookbook cookbooks.sre.hosts.reimage was started by jayme@cumin1003 for host wikikube-worker1355.eqiad.wmnet with OS trixie

Cookbook cookbooks.sre.hosts.reimage started by jayme@cumin1003 for host wikikube-worker1352.eqiad.wmnet with OS trixie completed:

  • wikikube-worker1352 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced UEFI HTTP Boot for next reboot
    • Host rebooted via Redfish
    • Host up (Debian installer)
    • Host up (new fresh trixie OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202603031148_jayme_2861527_wikikube-worker1352.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage started by jayme@cumin1003 for host wikikube-worker1354.eqiad.wmnet with OS trixie completed:

  • wikikube-worker1354 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced UEFI HTTP Boot for next reboot
    • Host rebooted via Redfish
    • Host up (Debian installer)
    • Host up (new fresh trixie OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202603031152_jayme_2863709_wikikube-worker1354.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage started by jayme@cumin1003 for host wikikube-worker1355.eqiad.wmnet with OS trixie completed:

  • wikikube-worker1355 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced UEFI HTTP Boot for next reboot
    • Host rebooted via Redfish
    • Host up (Debian installer)
    • Host up (new fresh trixie OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202603031158_jayme_2863943_wikikube-worker1355.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage was started by jayme@cumin1003 for host wikikube-worker1356.eqiad.wmnet with OS trixie

Cookbook cookbooks.sre.hosts.reimage was started by jayme@cumin1003 for host wikikube-worker1357.eqiad.wmnet with OS trixie

Cookbook cookbooks.sre.hosts.reimage was started by jayme@cumin1003 for host wikikube-worker1358.eqiad.wmnet with OS trixie

Cookbook cookbooks.sre.hosts.reimage started by jayme@cumin1003 for host wikikube-worker1353.eqiad.wmnet with OS trixie completed:

  • wikikube-worker1353 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced UEFI HTTP Boot for next reboot
    • Host rebooted via Redfish
    • Host up (Debian installer)
    • Host up (new fresh trixie OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202603031202_jayme_2862711_wikikube-worker1353.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage was started by jayme@cumin1003 for host wikikube-worker1359.eqiad.wmnet with OS trixie

Cookbook cookbooks.sre.hosts.reimage started by jayme@cumin1003 for host wikikube-worker1356.eqiad.wmnet with OS trixie completed:

  • wikikube-worker1356 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced UEFI HTTP Boot for next reboot
    • Host rebooted via Redfish
    • Host up (Debian installer)
    • Host up (new fresh trixie OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202603031233_jayme_2939014_wikikube-worker1356.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage started by jayme@cumin1003 for host wikikube-worker1357.eqiad.wmnet with OS trixie completed:

  • wikikube-worker1357 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced UEFI HTTP Boot for next reboot
    • Host rebooted via Redfish
    • Host up (Debian installer)
    • Host up (new fresh trixie OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202603031236_jayme_2939427_wikikube-worker1357.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage started by jayme@cumin1003 for host wikikube-worker1358.eqiad.wmnet with OS trixie completed:

  • wikikube-worker1358 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced UEFI HTTP Boot for next reboot
    • Host rebooted via Redfish
    • Host up (Debian installer)
    • Host up (new fresh trixie OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202603031243_jayme_2940136_wikikube-worker1358.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage started by jayme@cumin1003 for host wikikube-worker1359.eqiad.wmnet with OS trixie completed:

  • wikikube-worker1359 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced UEFI HTTP Boot for next reboot
    • Host rebooted via Redfish
    • Host up (Debian installer)
    • Host up (new fresh trixie OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202603031253_jayme_2971945_wikikube-worker1359.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Change #1247617 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[operations/puppet@production] Add wikikube-worker[1352-1359]

https://gerrit.wikimedia.org/r/1247617

Change #1247617 merged by JMeybohm:

[operations/puppet@production] Add wikikube-worker[1352-1359]

https://gerrit.wikimedia.org/r/1247617

Change #1247628 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[operations/cookbooks@master] k8s.pool-depool-cookbook: Handle calicoctl exiting with error

https://gerrit.wikimedia.org/r/1247628

Change #1247628 merged by jenkins-bot:

[operations/cookbooks@master] k8s.pool-depool-cookbook: Handle calicoctl exiting with error

https://gerrit.wikimedia.org/r/1247628

JMeybohm changed the task status from Open to Stalled.Mar 5 2026, 11:50 AM
JMeybohm changed the task status from Stalled to Open.Mar 17 2026, 3:51 PM
JMeybohm updated the task description. (Show Details)
Scott_French subscribed.

@JMeybohm - I'm speculatively moving this to Scheduled based on your most recent update. If it's something that you plan to pick back up now that it's unblocked, please move back to In Progress. Thanks!

Cookbook cookbooks.sre.hosts.reimage was started by jayme@cumin1003 for host wikikube-worker1347.eqiad.wmnet with OS trixie

@JMeybohm - I'm speculatively moving this to Scheduled based on your most recent update. If it's something that you plan to pick back up now that it's unblocked, please move back to In Progress. Thanks!

Yeah, sorry. Missed moving this one.

Cookbook cookbooks.sre.hosts.reimage started by jayme@cumin1003 for host wikikube-worker1347.eqiad.wmnet with OS trixie completed:

  • wikikube-worker1347 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Host successfully migrated to the new VLAN
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced UEFI HTTP Boot for next reboot
    • Host rebooted via Redfish
    • Host up (Debian installer)
    • Host up (new fresh trixie OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202603181719_jayme_74183_wikikube-worker1347.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Change #1255689 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[operations/puppet@production] wikikube: Add wikikube-worker[1335-1349].eqiad.wmnet

https://gerrit.wikimedia.org/r/1255689

Change #1255689 merged by JMeybohm:

[operations/puppet@production] wikikube: Add wikikube-worker[1335-1349].eqiad.wmnet

https://gerrit.wikimedia.org/r/1255689

JMeybohm updated the task description. (Show Details)