Page MenuHomePhabricator

Re-IP wikikube servers in codfw row A/B moving to per-rack subnets
Closed, ResolvedPublic

Description

Per T354869, we have to renumber various wikikube servers in codfw, namely in rows A/B. We 'll also utilize the opportunity for downtime to reimage and rename them.

Handle renumbering and vlan move for wikikube nodes, ideally before T370962: Southward Datacenter Switchover (September 2024)

Per-host tracking sheet:
https://docs.google.com/spreadsheets/d/1HvaScZNUH-toZrlYKNlLyHs5an6rXT5uUxQ1Nl49I5M/edit?usp=sharing

Procedure

Workers

Rename and renumber

  1. Locally, use add_k8s_node.py --puppet-dir $PUPPET_DIR --netbox-url https://netbox.wikimedia.org --netbox-token $NETBOX_TOKEN --task-id T372878 --rename-only --move-vlan $FQDNS (from serviceops kitchensink, make sure to pull main)
  2. Follow the instructions
  3. Make sure that the old names were correctly deactivated from puppetserver sudo puppet node deactivate $old_fqdn
  4. Run homer 'cr*codfw*' commit 'T372878' from cumin to remove the old BGP config

Note: When creating a relabel task, look if there isn't one already open and edit it to cut down on the number of small tasks for DC-Ops

Renumber only

From cumin, run:

  1. sudo cookbook sre.k8s.renumber-node -t T372878 $fqdn
  2. Follow the instructions from the cookbook for homer run (Running homer on the CR can technically be done as soon as the move-vlan part of the reimage cookbook is through)

Control plane

TODO

Details

Related Changes in Gerrit:
SubjectRepoBranchLines +/-
operations/puppetproduction+4 -7
operations/puppetproduction+5 -5
operations/puppetproduction+5 -5
operations/puppetproduction+5 -5
operations/puppetproduction+9 -9
operations/puppetproduction+16 -19
operations/puppetproduction+16 -19
operations/puppetproduction+7 -7
operations/puppetproduction+5 -5
operations/puppetproduction+5 -5
operations/puppetproduction+4 -4
operations/puppetproduction+13 -9
operations/puppetproduction+11 -14
operations/puppetproduction+11 -14
operations/puppetproduction+6 -6
operations/puppetproduction+7 -7
operations/puppetproduction+7 -7
operations/puppetproduction+10 -10
operations/puppetproduction+11 -11
operations/puppetproduction+5 -9
operations/puppetproduction+5 -1
operations/puppetproduction+8 -8
operations/puppetproduction+8 -11
operations/puppetproduction+1 -1
operations/puppetproduction+11 -11
operations/cookbooksmaster+15 -14
operations/puppetproduction+10 -13
operations/puppetproduction+6 -6
operations/puppetproduction+6 -6
operations/puppetproduction+9 -9
operations/puppetproduction+9 -9
operations/puppetproduction+5 -5
operations/puppetproduction+5 -5
operations/puppetproduction+8 -11
operations/puppetproduction+8 -11
operations/puppetproduction+5 -5
operations/puppetproduction+5 -5
operations/puppetproduction+4 -4
operations/puppetproduction+5 -5
operations/puppetproduction+4 -4
operations/puppetproduction+4 -4
operations/puppetproduction+5 -5
operations/puppetproduction+4 -4
operations/puppetproduction+5 -5
operations/puppetproduction+4 -4
operations/puppetproduction+4 -4
operations/puppetproduction+5 -5
Show related patches Customize query in gerrit

Related Objects

StatusSubtypeAssignedTask
Resolvedjijiki
OpenNone
OpenNone
Resolvedakosiaris
ResolvedJhancock.wm
ResolvedNone
ResolvedJhancock.wm
DuplicateNone
DuplicateNone
ResolvedJhancock.wm
DuplicateNone
DuplicateNone
ResolvedMoritzMuehlenhoff
ResolvedJhancock.wm
InvalidNone
ResolvedPRODUCTION ERRORClement_Goubert
ResolvedJMeybohm
ResolvedJhancock.wm
ResolvedJhancock.wm
ResolvedJhancock.wm
ResolvedJhancock.wm
ResolvedJhancock.wm
ResolvedJhancock.wm
ResolvedNone

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Cookbook cookbooks.sre.hosts.reimage started by akosiaris@cumin1002 for host wikikube-worker2114.codfw.wmnet with OS bullseye executed with errors:

  • wikikube-worker2114.codfw.wmnet (PASS)
  • wikikube-worker2114 (FAIL)
    • Downtimed on Icinga/Alertmanager
    • Unable to disable Puppet, the host may have been unreachable
    • Failed to migrate host to the new VLAN, sre.hosts.move-vlan cookbook returned 94
    • The reimage failed, see the cookbook logs for the details,You can also try typing "sudo install-console wikikube-worker2114.codfw.wmnet" to get a root shellbut depending on the failure this may not work.

Cookbook cookbooks.sre.k8s.renumber-node started by akosiaris@cumin1002 Renumbering for host wikikube-worker2114.codfw.wmnet completed:

  • wikikube-worker2114.codfw.wmnet (FAIL)
    • Failed to reimage node wikikube-worker2114.codfw.wmnet, sre.hosts.reimage returned 99
  • wikikube-worker2114 (FAIL)
    • Downtimed on Icinga/Alertmanager
    • Unable to disable Puppet, the host may have been unreachable
    • Failed to migrate host to the new VLAN, sre.hosts.move-vlan cookbook returned 94
    • The reimage failed, see the cookbook logs for the details,You can also try typing "sudo install-console wikikube-worker2114.codfw.wmnet" to get a root shellbut depending on the failure this may not work.

Cookbook cookbooks.sre.hosts.reimage started by akosiaris@cumin1002 for host wikikube-worker2115.codfw.wmnet with OS bullseye executed with errors:

  • wikikube-worker2115.codfw.wmnet (PASS)
  • wikikube-worker2115 (FAIL)
    • Host successfully migrated to the new VLAN
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • The reimage failed, see the cookbook logs for the details,You can also try typing "sudo install-console wikikube-worker2115.codfw.wmnet" to get a root shellbut depending on the failure this may not work.

Cookbook cookbooks.sre.k8s.renumber-node started by akosiaris@cumin1002 Renumbering for host wikikube-worker2115.codfw.wmnet completed:

  • wikikube-worker2115.codfw.wmnet (FAIL)
    • Failed to reimage node wikikube-worker2115.codfw.wmnet, sre.hosts.reimage returned 99
  • wikikube-worker2115 (FAIL)
    • Host successfully migrated to the new VLAN
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • The reimage failed, see the cookbook logs for the details,You can also try typing "sudo install-console wikikube-worker2115.codfw.wmnet" to get a root shellbut depending on the failure this may not work.

Cookbook cookbooks.sre.k8s.renumber-node was started by akosiaris@cumin1002 Renumbering for host wikikube-worker2114.codfw.wmnet

Cookbook cookbooks.sre.hosts.reimage was started by akosiaris@cumin1002 for host wikikube-worker2114.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.k8s.renumber-node was started by akosiaris@cumin1002 Renumbering for host wikikube-worker2115.codfw.wmnet

Cookbook cookbooks.sre.k8s.renumber-node started by akosiaris@cumin1002 Renumbering for host wikikube-worker2115.codfw.wmnet completed:

  • wikikube-worker2115.codfw.wmnet (FAIL)
    • Failed to reimage node wikikube-worker2115.codfw.wmnet, sre.hosts.reimage returned 94

Cookbook cookbooks.sre.k8s.renumber-node was started by akosiaris@cumin1002 Renumbering for host wikikube-worker2115.codfw.wmnet

Cookbook cookbooks.sre.hosts.reimage was started by akosiaris@cumin1002 for host wikikube-worker2115.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by akosiaris@cumin1002 for host wikikube-worker2114.codfw.wmnet with OS bullseye executed with errors:

  • wikikube-worker2114.codfw.wmnet (PASS)
  • wikikube-worker2114 (FAIL)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Failed to migrate host to the new VLAN, sre.hosts.move-vlan cookbook returned 94
    • The reimage failed, see the cookbook logs for the details,You can also try typing "sudo install-console wikikube-worker2114.codfw.wmnet" to get a root shellbut depending on the failure this may not work.

Cookbook cookbooks.sre.k8s.renumber-node started by akosiaris@cumin1002 Renumbering for host wikikube-worker2114.codfw.wmnet completed:

  • wikikube-worker2114.codfw.wmnet (FAIL)
    • Failed to reimage node wikikube-worker2114.codfw.wmnet, sre.hosts.reimage returned 99
  • wikikube-worker2114 (FAIL)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Failed to migrate host to the new VLAN, sre.hosts.move-vlan cookbook returned 94
    • The reimage failed, see the cookbook logs for the details,You can also try typing "sudo install-console wikikube-worker2114.codfw.wmnet" to get a root shellbut depending on the failure this may not work.

Cookbook cookbooks.sre.hosts.reimage started by akosiaris@cumin1002 for host wikikube-worker2115.codfw.wmnet with OS bullseye executed with errors:

  • wikikube-worker2115.codfw.wmnet (PASS)
  • wikikube-worker2115 (FAIL)
    • Downtimed on Icinga/Alertmanager
    • Unable to disable Puppet, the host may have been unreachable
    • Failed to migrate host to the new VLAN, sre.hosts.move-vlan cookbook returned 94
    • The reimage failed, see the cookbook logs for the details,You can also try typing "sudo install-console wikikube-worker2115.codfw.wmnet" to get a root shellbut depending on the failure this may not work.

Cookbook cookbooks.sre.k8s.renumber-node started by akosiaris@cumin1002 Renumbering for host wikikube-worker2115.codfw.wmnet completed:

  • wikikube-worker2115.codfw.wmnet (FAIL)
    • Failed to reimage node wikikube-worker2115.codfw.wmnet, sre.hosts.reimage returned 99
  • wikikube-worker2115 (FAIL)
    • Downtimed on Icinga/Alertmanager
    • Unable to disable Puppet, the host may have been unreachable
    • Failed to migrate host to the new VLAN, sre.hosts.move-vlan cookbook returned 94
    • The reimage failed, see the cookbook logs for the details,You can also try typing "sudo install-console wikikube-worker2115.codfw.wmnet" to get a root shellbut depending on the failure this may not work.

Cookbook cookbooks.sre.hosts.reimage started by akosiaris@cumin1002 for host wikikube-worker2116.codfw.wmnet with OS bullseye executed with errors:

  • wikikube-worker2116.codfw.wmnet (PASS)
  • wikikube-worker2116 (FAIL)
    • Host successfully migrated to the new VLAN
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • The reimage failed, see the cookbook logs for the details,You can also try typing "sudo install-console wikikube-worker2116.codfw.wmnet" to get a root shellbut depending on the failure this may not work.

Cookbook cookbooks.sre.k8s.renumber-node started by akosiaris@cumin1002 Renumbering for host wikikube-worker2116.codfw.wmnet completed:

  • wikikube-worker2116.codfw.wmnet (FAIL)
    • Failed to reimage node wikikube-worker2116.codfw.wmnet, sre.hosts.reimage returned 99
  • wikikube-worker2116 (FAIL)
    • Host successfully migrated to the new VLAN
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • The reimage failed, see the cookbook logs for the details,You can also try typing "sudo install-console wikikube-worker2116.codfw.wmnet" to get a root shellbut depending on the failure this may not work.

Cookbook cookbooks.sre.hosts.reimage started by akosiaris@cumin1002 for host wikikube-worker2119.codfw.wmnet with OS bullseye executed with errors:

  • wikikube-worker2119.codfw.wmnet (PASS)
  • wikikube-worker2119 (FAIL)
    • Host successfully migrated to the new VLAN
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • The reimage failed, see the cookbook logs for the details,You can also try typing "sudo install-console wikikube-worker2119.codfw.wmnet" to get a root shellbut depending on the failure this may not work.

Cookbook cookbooks.sre.k8s.renumber-node started by akosiaris@cumin1002 Renumbering for host wikikube-worker2119.codfw.wmnet completed:

  • wikikube-worker2119.codfw.wmnet (FAIL)
    • Failed to reimage node wikikube-worker2119.codfw.wmnet, sre.hosts.reimage returned 99
  • wikikube-worker2119 (FAIL)
    • Host successfully migrated to the new VLAN
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • The reimage failed, see the cookbook logs for the details,You can also try typing "sudo install-console wikikube-worker2119.codfw.wmnet" to get a root shellbut depending on the failure this may not work.

Cookbook cookbooks.sre.hosts.reimage started by akosiaris@cumin1002 for host wikikube-worker2118.codfw.wmnet with OS bullseye completed:

  • wikikube-worker2118.codfw.wmnet (PASS)
  • wikikube-worker2118 (PASS)
    • Host successfully migrated to the new VLAN
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202409131411_akosiaris_242015_wikikube-worker2118.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Mentioned in SAL (#wikimedia-operations) [2024-09-13T14:33:17Z] <akosiaris> homer cr*codfw* commit 'T372878'

Mentioned in SAL (#wikimedia-operations) [2024-09-13T14:33:24Z] <akosiaris> homer lsw1-a6-codfw* commit 'T372878'

Cookbook cookbooks.sre.hosts.reimage started by akosiaris@cumin1002 for host wikikube-worker2117.codfw.wmnet with OS bullseye executed with errors:

  • wikikube-worker2117.codfw.wmnet (PASS)
  • wikikube-worker2117 (FAIL)
    • Host successfully migrated to the new VLAN
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • The reimage failed, see the cookbook logs for the details,You can also try typing "sudo install-console wikikube-worker2117.codfw.wmnet" to get a root shellbut depending on the failure this may not work.

Cookbook cookbooks.sre.k8s.renumber-node started by akosiaris@cumin1002 Renumbering for host wikikube-worker2117.codfw.wmnet completed:

  • wikikube-worker2117.codfw.wmnet (FAIL)
    • Failed to reimage node wikikube-worker2117.codfw.wmnet, sre.hosts.reimage returned 99
  • wikikube-worker2117 (FAIL)
    • Host successfully migrated to the new VLAN
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • The reimage failed, see the cookbook logs for the details,You can also try typing "sudo install-console wikikube-worker2117.codfw.wmnet" to get a root shellbut depending on the failure this may not work.

Cookbook cookbooks.sre.k8s.renumber-node started by akosiaris@cumin1002 Renumbering for host wikikube-worker2118.codfw.wmnet completed:

  • wikikube-worker2118.codfw.wmnet (FAIL)
    • Successfully reimaged node wikikube-worker2118.codfw.wmnet
    • Successfully set BGP to true in Netbox
    • Successfully ran puppet agent on deployment servers
    • Successfully ran puppet agent on registry servers
    • Failed to pool and uncordon node wikikube-worker2118.codfw.wmnet, sre.k8s.pool-depool-node returned 99
  • wikikube-worker2118 (PASS)
    • Host successfully migrated to the new VLAN
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202409131411_akosiaris_242015_wikikube-worker2118.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
  • wikikube-worker2118.codfw.wmnet (PASS)

Cookbook cookbooks.sre.hosts.reimage started by akosiaris@cumin1002 for host wikikube-worker2120.codfw.wmnet with OS bullseye completed:

  • wikikube-worker2120.codfw.wmnet (PASS)
  • wikikube-worker2120 (WARN)
    • Host successfully migrated to the new VLAN
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run failed and logged in /var/log/spicerack/sre/hosts/reimage/202409131409_akosiaris_237773_wikikube-worker2120.out, asking the operator what to do
    • First Puppet run failed and the operator skipped it
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Change #1072762 had a related patch set uploaded (by Scott French; author: Scott French):

[operations/puppet@production] kubernetes: re-name / IP mw231[345]

https://gerrit.wikimedia.org/r/1072762

Mentioned in SAL (#wikimedia-operations) [2024-09-13T15:12:43Z] <akosiaris> homer lsw1-a6-codfw* commit T372878

Cookbook cookbooks.sre.k8s.renumber-node started by akosiaris@cumin1002 Renumbering for host wikikube-worker2120.codfw.wmnet completed:

  • wikikube-worker2120.codfw.wmnet (FAIL)
    • Successfully reimaged node wikikube-worker2120.codfw.wmnet
    • Successfully set BGP to true in Netbox
    • Failed to run puppet agent on deployment servers
    • Successfully ran puppet agent on deployment servers
    • Successfully ran puppet agent on registry servers
    • Pooled and uncordoned node wikikube-worker2120.codfw.wmnet
  • wikikube-worker2120 (WARN)
    • Host successfully migrated to the new VLAN
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run failed and logged in /var/log/spicerack/sre/hosts/reimage/202409131409_akosiaris_237773_wikikube-worker2120.out, asking the operator what to do
    • First Puppet run failed and the operator skipped it
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
  • wikikube-worker2120.codfw.wmnet (PASS)
    • Host wikikube-worker2120.codfw.wmnet pooled in codfw

Cookbook cookbooks.sre.k8s.renumber-node started by akosiaris@cumin1002 Renumbering for host wikikube-worker2120.codfw.wmnet completed:

  • wikikube-worker2120.codfw.wmnet (FAIL)
    • Successfully reimaged node wikikube-worker2120.codfw.wmnet
    • Successfully set BGP to true in Netbox
    • Failed to run puppet agent on deployment servers
    • Successfully ran puppet agent on deployment servers
    • Successfully ran puppet agent on registry servers
    • Pooled and uncordoned node wikikube-worker2120.codfw.wmnet
  • wikikube-worker2120 (WARN)
    • Host successfully migrated to the new VLAN
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run failed and logged in /var/log/spicerack/sre/hosts/reimage/202409131409_akosiaris_237773_wikikube-worker2120.out, asking the operator what to do
    • First Puppet run failed and the operator skipped it
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
  • wikikube-worker2120.codfw.wmnet (PASS)
    • Host wikikube-worker2120.codfw.wmnet pooled in codfw

Change #1072762 merged by Scott French:

[operations/puppet@production] kubernetes: re-name / IP mw231[345]

https://gerrit.wikimedia.org/r/1072762

Cookbook cookbooks.sre.hosts.rename started by swfrench@cumin2002 from mw2313 to wikikube-worker2121 completed:

  • mw2313 (PASS)
    • ✔️ Downtimed host on Icinga/Alertmanager
    • ✔️ Disabled puppet and its timer
    • ✔️ Disabled debmonitor-client timer
    • ✔️ Netbox updated
    • ✔️ BMC Hostname updated
    • ✔️ DNS updated
    • ✔️ Switch description updated
    • ✔️ Removed from DebMonitor
    • ✔️ Removed from Puppet master and PuppetDB
    • Rename completed 👍 - now please run the re-image cookbook on the new name with --new

Cookbook cookbooks.sre.hosts.rename started by swfrench@cumin2002 from mw2314 to wikikube-worker2122 completed:

  • mw2314 (PASS)
    • ✔️ Downtimed host on Icinga/Alertmanager
    • ✔️ Disabled puppet and its timer
    • ✔️ Disabled debmonitor-client timer
    • ✔️ Netbox updated
    • ✔️ BMC Hostname updated
    • ✔️ DNS updated
    • ✔️ Switch description updated
    • ✔️ Removed from DebMonitor
    • ✔️ Removed from Puppet master and PuppetDB
    • Rename completed 👍 - now please run the re-image cookbook on the new name with --new

Cookbook cookbooks.sre.hosts.rename started by swfrench@cumin2002 from mw2315 to wikikube-worker2123 completed:

  • mw2315 (PASS)
    • ✔️ Downtimed host on Icinga/Alertmanager
    • ✔️ Disabled puppet and its timer
    • ✔️ Disabled debmonitor-client timer
    • ✔️ Netbox updated
    • ✔️ BMC Hostname updated
    • ✔️ DNS updated
    • ✔️ Switch description updated
    • ✔️ Removed from DebMonitor
    • ✔️ Removed from Puppet master and PuppetDB
    • Rename completed 👍 - now please run the re-image cookbook on the new name with --new

Cookbook cookbooks.sre.k8s.renumber-node was started by swfrench@cumin2002 Renumbering for host wikikube-worker2121.codfw.wmnet

Cookbook cookbooks.sre.hosts.reimage was started by swfrench@cumin2002 for host wikikube-worker2121.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.k8s.renumber-node was started by swfrench@cumin2002 Renumbering for host wikikube-worker2122.codfw.wmnet

Cookbook cookbooks.sre.hosts.reimage was started by swfrench@cumin2002 for host wikikube-worker2122.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.k8s.renumber-node was started by swfrench@cumin2002 Renumbering for host wikikube-worker2123.codfw.wmnet

Cookbook cookbooks.sre.hosts.reimage was started by swfrench@cumin2002 for host wikikube-worker2123.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by swfrench@cumin2002 for host wikikube-worker2121.codfw.wmnet with OS bullseye completed:

  • wikikube-worker2121.codfw.wmnet (PASS)
  • wikikube-worker2121 (PASS)
    • Host successfully migrated to the new VLAN
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202409131616_swfrench_4028128_wikikube-worker2121.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Mentioned in SAL (#wikimedia-operations) [2024-09-13T16:38:54Z] <swfrench-wmf> running homer lsw1-b3-codfw* commit 'T372878'

Cookbook cookbooks.sre.hosts.reimage started by swfrench@cumin2002 for host wikikube-worker2122.codfw.wmnet with OS bullseye completed:

  • wikikube-worker2122.codfw.wmnet (PASS)
  • wikikube-worker2122 (PASS)
    • Host successfully migrated to the new VLAN
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202409131623_swfrench_4035684_wikikube-worker2122.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.k8s.renumber-node started by swfrench@cumin2002 Renumbering for host wikikube-worker2121.codfw.wmnet completed:

  • wikikube-worker2121.codfw.wmnet (PASS)
    • Successfully reimaged node wikikube-worker2121.codfw.wmnet
    • Successfully set BGP to true in Netbox
    • Successfully ran puppet agent on deployment servers
    • Successfully ran puppet agent on registry servers
    • Pooled and uncordoned node wikikube-worker2121.codfw.wmnet
  • wikikube-worker2121 (PASS)
    • Host successfully migrated to the new VLAN
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202409131616_swfrench_4028128_wikikube-worker2121.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
  • wikikube-worker2121.codfw.wmnet (PASS)
    • Host wikikube-worker2121.codfw.wmnet pooled in codfw

Cookbook cookbooks.sre.hosts.reimage started by swfrench@cumin2002 for host wikikube-worker2123.codfw.wmnet with OS bullseye completed:

  • wikikube-worker2123.codfw.wmnet (PASS)
  • wikikube-worker2123 (PASS)
    • Host successfully migrated to the new VLAN
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202409131631_swfrench_4043602_wikikube-worker2123.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.k8s.renumber-node started by swfrench@cumin2002 Renumbering for host wikikube-worker2122.codfw.wmnet completed:

  • wikikube-worker2122.codfw.wmnet (PASS)
    • Successfully reimaged node wikikube-worker2122.codfw.wmnet
    • Successfully set BGP to true in Netbox
    • Successfully ran puppet agent on deployment servers
    • Successfully ran puppet agent on registry servers
    • Pooled and uncordoned node wikikube-worker2122.codfw.wmnet
  • wikikube-worker2122 (PASS)
    • Host successfully migrated to the new VLAN
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202409131623_swfrench_4035684_wikikube-worker2122.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
  • wikikube-worker2122.codfw.wmnet (PASS)
    • Host wikikube-worker2122.codfw.wmnet pooled in codfw

Mentioned in SAL (#wikimedia-operations) [2024-09-13T16:57:49Z] <swfrench-wmf> running homer cr*codfw* commit 'T372878'

Cookbook cookbooks.sre.k8s.renumber-node started by swfrench@cumin2002 Renumbering for host wikikube-worker2123.codfw.wmnet completed:

  • wikikube-worker2123.codfw.wmnet (PASS)
    • Successfully reimaged node wikikube-worker2123.codfw.wmnet
    • Successfully set BGP to true in Netbox
    • Successfully ran puppet agent on deployment servers
    • Successfully ran puppet agent on registry servers
    • Pooled and uncordoned node wikikube-worker2123.codfw.wmnet
  • wikikube-worker2123 (PASS)
    • Host successfully migrated to the new VLAN
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202409131631_swfrench_4043602_wikikube-worker2123.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
  • wikikube-worker2123.codfw.wmnet (PASS)
    • Host wikikube-worker2123.codfw.wmnet pooled in codfw

Change #1074410 had a related patch set uploaded (by Effie Mouzeli; author: Effie Mouzeli):

[operations/puppet@production] kubernetes: mw2313 -> wikikube-worker2124

https://gerrit.wikimedia.org/r/1074410

Change #1074410 merged by Effie Mouzeli:

[operations/puppet@production] kubernetes: rename mw2424 -> wikikube-worker2124

https://gerrit.wikimedia.org/r/1074410

Cookbook cookbooks.sre.hosts.rename started by jiji@cumin1002 from mw2424 to wikikube-worker2124 completed:

  • mw2424 (PASS)
    • ✔️ Downtimed host on Icinga/Alertmanager
    • ✔️ Disabled puppet and its timer
    • ✔️ Disabled debmonitor-client timer
    • ✔️ Netbox updated
    • ✔️ BMC Hostname updated
    • ✔️ DNS updated
    • ✔️ Switch description updated
    • ✔️ Removed from DebMonitor
    • ✔️ Removed from Puppet master and PuppetDB
    • Rename completed 👍 - now please run the re-image cookbook on the new name with --new

Cookbook cookbooks.sre.k8s.renumber-node was started by jiji@cumin1002 Renumbering for host wikikube-worker2124.codfw.wmnet

Cookbook cookbooks.sre.hosts.reimage was started by jiji@cumin1002 for host wikikube-worker2124.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by jiji@cumin1002 for host wikikube-worker2124.codfw.wmnet with OS bullseye completed:

  • wikikube-worker2124.codfw.wmnet (PASS)
  • wikikube-worker2124 (PASS)
    • Host successfully migrated to the new VLAN
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202409230927_jiji_1054100_wikikube-worker2124.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.k8s.renumber-node started by jiji@cumin1002 Renumbering for host wikikube-worker2124.codfw.wmnet completed:

  • wikikube-worker2124.codfw.wmnet (FAIL)
    • Successfully reimaged node wikikube-worker2124.codfw.wmnet
    • Successfully set BGP to true in Netbox
    • Successfully ran puppet agent on deployment servers
    • Successfully ran puppet agent on registry servers
    • Failed to pool and uncordon node wikikube-worker2124.codfw.wmnet, sre.k8s.pool-depool-node returned 99
  • wikikube-worker2124 (PASS)
    • Host successfully migrated to the new VLAN
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202409230927_jiji_1054100_wikikube-worker2124.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
  • wikikube-worker2124.codfw.wmnet (PASS)

Mentioned in SAL (#wikimedia-operations) [2024-09-23T10:25:41Z] <effie> homer lsw1-a6-codfw* commit 'T372878'

Change #1074987 had a related patch set uploaded (by Effie Mouzeli; author: Effie Mouzeli):

[operations/puppet@production] kubernetes: rename mw2425 -> wikikube-worker2125

https://gerrit.wikimedia.org/r/1074987

Change #1074987 merged by Effie Mouzeli:

[operations/puppet@production] kubernetes: rename mw2425 -> wikikube-worker2125

https://gerrit.wikimedia.org/r/1074987

Cookbook cookbooks.sre.hosts.rename started by jiji@cumin1002 from mw2425 to wikikube-worker2125 completed:

  • mw2425 (PASS)
    • ✔️ Downtimed host on Icinga/Alertmanager
    • ✔️ Disabled puppet and its timer
    • ✔️ Disabled debmonitor-client timer
    • ✔️ Netbox updated
    • ✔️ BMC Hostname updated
    • ✔️ DNS updated
    • ✔️ Switch description updated
    • ✔️ Removed from DebMonitor
    • ✔️ Removed from Puppet master and PuppetDB
    • Rename completed 👍 - now please run the re-image cookbook on the new name with --new

Cookbook cookbooks.sre.k8s.renumber-node was started by jiji@cumin1002 Renumbering for host wikikube-worker2125.codfw.wmnet

Cookbook cookbooks.sre.hosts.reimage was started by jiji@cumin1002 for host wikikube-worker2125.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by jiji@cumin1002 for host wikikube-worker2125.codfw.wmnet with OS bullseye completed:

  • wikikube-worker2125.codfw.wmnet (PASS)
  • wikikube-worker2125 (PASS)
    • Host successfully migrated to the new VLAN
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202409231258_jiji_1227176_wikikube-worker2125.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Mentioned in SAL (#wikimedia-operations) [2024-09-23T13:21:24Z] <effie> homer lsw1-a6-codfw* commit 'T372878'

Cookbook cookbooks.sre.k8s.renumber-node started by jiji@cumin1002 Renumbering for host wikikube-worker2125.codfw.wmnet completed:

  • wikikube-worker2125.codfw.wmnet (FAIL)
    • Successfully reimaged node wikikube-worker2125.codfw.wmnet
    • Successfully set BGP to true in Netbox
    • Failed to run puppet agent on deployment servers
    • Successfully ran puppet agent on deployment servers
    • Failed to run puppet agent on registry servers
    • Pooled and uncordoned node wikikube-worker2125.codfw.wmnet
  • wikikube-worker2125 (PASS)
    • Host successfully migrated to the new VLAN
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202409231258_jiji_1227176_wikikube-worker2125.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
  • wikikube-worker2125.codfw.wmnet (PASS)
    • Host wikikube-worker2125.codfw.wmnet pooled in codfw

Cookbook cookbooks.sre.k8s.renumber-node started by jiji@cumin1002 Renumbering for host wikikube-worker2125.codfw.wmnet completed:

  • wikikube-worker2125.codfw.wmnet (FAIL)
    • Successfully reimaged node wikikube-worker2125.codfw.wmnet
    • Successfully set BGP to true in Netbox
    • Failed to run puppet agent on deployment servers
    • Successfully ran puppet agent on deployment servers
    • Failed to run puppet agent on registry servers
    • Pooled and uncordoned node wikikube-worker2125.codfw.wmnet
  • wikikube-worker2125 (PASS)
    • Host successfully migrated to the new VLAN
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202409231258_jiji_1227176_wikikube-worker2125.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
  • wikikube-worker2125.codfw.wmnet (PASS)
    • Host wikikube-worker2125.codfw.wmnet pooled in codfw

Change #1075149 had a related patch set uploaded (by Effie Mouzeli; author: Effie Mouzeli):

[operations/puppet@production] kubernetes: rename mw2426 -> wikikube-worker2126

https://gerrit.wikimedia.org/r/1075149

Change #1075149 merged by Effie Mouzeli:

[operations/puppet@production] kubernetes: rename mw2426 -> wikikube-worker2126

https://gerrit.wikimedia.org/r/1075149

Cookbook cookbooks.sre.hosts.rename started by jiji@cumin1002 from mw2426 to wikikube-worker2126 completed:

  • mw2426 (PASS)
    • ✔️ Downtimed host on Icinga/Alertmanager
    • ✔️ Disabled puppet and its timer
    • ✔️ Disabled debmonitor-client timer
    • ✔️ Netbox updated
    • ✔️ BMC Hostname updated
    • ✔️ DNS updated
    • ✔️ Switch description updated
    • ✔️ Removed from DebMonitor
    • ✔️ Removed from Puppet master and PuppetDB
    • Rename completed 👍 - now please run the re-image cookbook on the new name with --new

Cookbook cookbooks.sre.k8s.renumber-node was started by jiji@cumin1002 Renumbering for host wikikube-worker2126.codfw.wmnet

Cookbook cookbooks.sre.hosts.reimage was started by jiji@cumin1002 for host wikikube-worker2126.codfw.wmnet with OS bullseye

Change #1075158 had a related patch set uploaded (by Effie Mouzeli; author: Effie Mouzeli):

[operations/puppet@production] kubernetes: rename mw2427 -> wikikube-worker2127

https://gerrit.wikimedia.org/r/1075158

Cookbook cookbooks.sre.hosts.reimage started by jiji@cumin1002 for host wikikube-worker2126.codfw.wmnet with OS bullseye completed:

  • wikikube-worker2126.codfw.wmnet (PASS)
  • wikikube-worker2126 (WARN)
    • Host successfully migrated to the new VLAN
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202409241034_jiji_1729962_wikikube-worker2126.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB

Mentioned in SAL (#wikimedia-operations) [2024-09-24T11:31:31Z] <effie> homer lsw1-a6-codfw* commit 'T372878'

Cookbook cookbooks.sre.k8s.renumber-node started by jiji@cumin1002 Renumbering for host wikikube-worker2126.codfw.wmnet completed:

  • wikikube-worker2126.codfw.wmnet (PASS)
    • Successfully reimaged node wikikube-worker2126.codfw.wmnet
    • Successfully set BGP to true in Netbox
    • Successfully ran puppet agent on deployment servers
    • Successfully ran puppet agent on registry servers
    • Pooled and uncordoned node wikikube-worker2126.codfw.wmnet
  • wikikube-worker2126 (WARN)
    • Host successfully migrated to the new VLAN
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202409241034_jiji_1729962_wikikube-worker2126.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB
  • wikikube-worker2126.codfw.wmnet (PASS)
    • Host wikikube-worker2126.codfw.wmnet pooled in codfw

Change #1075158 merged by Effie Mouzeli:

[operations/puppet@production] kubernetes: rename mw2427 -> wikikube-worker2127

https://gerrit.wikimedia.org/r/1075158

Cookbook cookbooks.sre.hosts.rename started by jiji@cumin1002 from mw2427 to wikikube-worker2127 completed:

  • mw2427 (WARN)
    • ✔️ Downtimed host on Icinga/Alertmanager
    • ⚠️Rollback initiated but nothing to rollback (too soon or too late).⚠️

Cookbook cookbooks.sre.hosts.rename started by jiji@cumin1002 from mw2427 to wikikube-worker2127 completed:

  • mw2427 (PASS)
    • ✔️ Downtimed host on Icinga/Alertmanager
    • ✔️ Disabled puppet and its timer
    • ✔️ Disabled debmonitor-client timer
    • ✔️ Netbox updated
    • ✔️ BMC Hostname updated
    • ✔️ DNS updated
    • ✔️ Switch description updated
    • ✔️ Removed from DebMonitor
    • ✔️ Removed from Puppet master and PuppetDB
    • Rename completed 👍 - now please run the re-image cookbook on the new name with --new

Cookbook cookbooks.sre.k8s.renumber-node was started by jiji@cumin1002 Renumbering for host wikikube-worker2127.codfw.wmnet

Cookbook cookbooks.sre.hosts.reimage was started by jiji@cumin1002 for host wikikube-worker2127.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by jiji@cumin1002 for host wikikube-worker2127.codfw.wmnet with OS bullseye completed:

  • wikikube-worker2127.codfw.wmnet (PASS)
  • wikikube-worker2127 (PASS)
    • Host successfully migrated to the new VLAN
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202409241258_jiji_1805090_wikikube-worker2127.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Mentioned in SAL (#wikimedia-operations) [2024-09-24T13:21:09Z] <effie> homer lsw1-a6-codfw* commit 'T372878

Cookbook cookbooks.sre.k8s.renumber-node started by jiji@cumin1002 Renumbering for host wikikube-worker2127.codfw.wmnet completed:

  • wikikube-worker2127.codfw.wmnet (PASS)
    • Successfully reimaged node wikikube-worker2127.codfw.wmnet
    • Successfully set BGP to true in Netbox
    • Successfully ran puppet agent on deployment servers
    • Successfully ran puppet agent on registry servers
    • Pooled and uncordoned node wikikube-worker2127.codfw.wmnet
  • wikikube-worker2127 (PASS)
    • Host successfully migrated to the new VLAN
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202409241258_jiji_1805090_wikikube-worker2127.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
  • wikikube-worker2127.codfw.wmnet (PASS)
    • Host wikikube-worker2127.codfw.wmnet pooled in codfw
akosiaris claimed this task.

I 'll resolve this. All hosts that could be renumbered have been renumbered. 6 hosts have been decomed instead. And then there is a number of jobrunners that will hopefully soon be reimaged to wikikube-worker nodes.