Page MenuHomePhabricator

wikikube-worker21[28-35] implementation tracking
Closed, ResolvedPublic

Description

wikikube-worker21[28-35] implementation tracking

This task is to track the service implementation of serviceops host(s) listed in the task description.

Once the linked racking task has been resolved, this task can be implemented.

This sub-task creation/update is per the request of serviceops; this task is assigned at creation to the 'Sub-team Technical Contact' provided in the initial ordering task.

Event Timeline

Change #1090829 had a related patch set uploaded (by Clément Goubert; author: Clément Goubert):

[operations/puppet@production] wikikube: Add wikikube-worker21[28-35]

https://gerrit.wikimedia.org/r/1090829

Change #1090829 merged by Clément Goubert:

[operations/puppet@production] wikikube: Add wikikube-worker21[28-35]

https://gerrit.wikimedia.org/r/1090829

Cookbook cookbooks.sre.hosts.reimage was started by cgoubert@cumin1002 for host wikikube-worker2128.codfw.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage was started by cgoubert@cumin1002 for host wikikube-worker2129.codfw.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage was started by cgoubert@cumin1002 for host wikikube-worker2130.codfw.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage was started by cgoubert@cumin1002 for host wikikube-worker2131.codfw.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage started by cgoubert@cumin1002 for host wikikube-worker2128.codfw.wmnet with OS bookworm executed with errors:

  • wikikube-worker2128 (FAIL)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • The reimage failed, see the cookbook logs for the details. You can also try typing "sudo install-console wikikube-worker2128.codfw.wmnet" to get a root shell, but depending on the failure this may not work.

Cookbook cookbooks.sre.hosts.reimage was started by cgoubert@cumin1002 for host wikikube-worker2128.codfw.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage started by cgoubert@cumin1002 for host wikikube-worker2128.codfw.wmnet with OS bookworm executed with errors:

  • wikikube-worker2128 (FAIL)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • The reimage failed, see the cookbook logs for the details. You can also try typing "sudo install-console wikikube-worker2128.codfw.wmnet" to get a root shell, but depending on the failure this may not work.

Cookbook cookbooks.sre.hosts.reimage was started by cgoubert@cumin1002 for host wikikube-worker2128.codfw.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage started by cgoubert@cumin1002 for host wikikube-worker2129.codfw.wmnet with OS bookworm executed with errors:

  • wikikube-worker2129 (FAIL)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • The reimage failed, see the cookbook logs for the details. You can also try typing "sudo install-console wikikube-worker2129.codfw.wmnet" to get a root shell, but depending on the failure this may not work.

Cookbook cookbooks.sre.hosts.reimage started by cgoubert@cumin1002 for host wikikube-worker2128.codfw.wmnet with OS bookworm executed with errors:

  • wikikube-worker2128 (FAIL)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • The reimage failed, see the cookbook logs for the details. You can also try typing "sudo install-console wikikube-worker2128.codfw.wmnet" to get a root shell, but depending on the failure this may not work.

Cookbook cookbooks.sre.hosts.reimage was started by cgoubert@cumin1002 for host wikikube-worker2128.codfw.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage started by cgoubert@cumin1002 for host wikikube-worker2130.codfw.wmnet with OS bookworm executed with errors:

  • wikikube-worker2130 (FAIL)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • The reimage failed, see the cookbook logs for the details. You can also try typing "sudo install-console wikikube-worker2130.codfw.wmnet" to get a root shell, but depending on the failure this may not work.

Cookbook cookbooks.sre.hosts.reimage started by cgoubert@cumin1002 for host wikikube-worker2131.codfw.wmnet with OS bookworm executed with errors:

  • wikikube-worker2131 (FAIL)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • The reimage failed, see the cookbook logs for the details. You can also try typing "sudo install-console wikikube-worker2131.codfw.wmnet" to get a root shell, but depending on the failure this may not work.

Cookbook cookbooks.sre.hosts.reimage was started by cgoubert@cumin1002 for host wikikube-worker2129.codfw.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage was started by cgoubert@cumin1002 for host wikikube-worker2130.codfw.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage started by cgoubert@cumin1002 for host wikikube-worker2128.codfw.wmnet with OS bookworm completed:

  • wikikube-worker2128 (PASS)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202411131455_cgoubert_2054197_wikikube-worker2128.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage was started by cgoubert@cumin1002 for host wikikube-worker2131.codfw.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage started by cgoubert@cumin1002 for host wikikube-worker2129.codfw.wmnet with OS bookworm completed:

  • wikikube-worker2129 (PASS)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202411131518_cgoubert_2056052_wikikube-worker2129.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage was started by cgoubert@cumin1002 for host wikikube-worker2132.codfw.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage was started by cgoubert@cumin1002 for host wikikube-worker2133.codfw.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage started by cgoubert@cumin1002 for host wikikube-worker2130.codfw.wmnet with OS bookworm completed:

  • wikikube-worker2130 (PASS)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202411131534_cgoubert_2060994_wikikube-worker2130.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage started by cgoubert@cumin1002 for host wikikube-worker2131.codfw.wmnet with OS bookworm completed:

  • wikikube-worker2131 (PASS)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202411131537_cgoubert_2061402_wikikube-worker2131.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage was started by cgoubert@cumin1002 for host wikikube-worker2134.codfw.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage was started by cgoubert@cumin1002 for host wikikube-worker2135.codfw.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage started by cgoubert@cumin1002 for host wikikube-worker2132.codfw.wmnet with OS bookworm completed:

  • wikikube-worker2132 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202411131605_cgoubert_2067437_wikikube-worker2132.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage started by cgoubert@cumin1002 for host wikikube-worker2133.codfw.wmnet with OS bookworm completed:

  • wikikube-worker2133 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202411131608_cgoubert_2067710_wikikube-worker2133.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage started by cgoubert@cumin1002 for host wikikube-worker2134.codfw.wmnet with OS bookworm completed:

  • wikikube-worker2134 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202411131629_cgoubert_2074334_wikikube-worker2134.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage started by cgoubert@cumin1002 for host wikikube-worker2135.codfw.wmnet with OS bookworm completed:

  • wikikube-worker2135 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202411131631_cgoubert_2074467_wikikube-worker2135.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Mentioned in SAL (#wikimedia-operations) [2024-11-13T16:58:22Z] <claime> homer 'lsw1-b2-codfw*' commit T377008

Mentioned in SAL (#wikimedia-operations) [2024-11-13T17:01:15Z] <claime> homer 'lsw1-b4-codfw*' commit T377008

Mentioned in SAL (#wikimedia-operations) [2024-11-13T17:17:25Z] <claime> homer 'lsw1-c4-codfw*' commit 'T377008'

Mentioned in SAL (#wikimedia-operations) [2024-11-13T17:18:08Z] <claime> homer 'lsw1-d4-codfw*' commit 'T377008'

Mentioned in SAL (#wikimedia-operations) [2024-11-13T17:18:54Z] <claime> homer 'lsw1-c2-codfw*' commit 'T377008'

Mentioned in SAL (#wikimedia-operations) [2024-11-13T17:20:18Z] <claime> homer 'lsw1-d2-codfw*' commit 'T377008'

pool host wikikube-worker[2128-2135].codfw.wmnet by cgoubert@cumin1002 with reason: New nodes

Cookbook cookbooks.sre.k8s.pool-depool-node started by cgoubert@cumin1002 pool for host wikikube-worker[2128-2135].codfw.wmnet completed:

  • wikikube-worker[2128-2135].codfw.wmnet (PASS)
    • Host wikikube-worker[2128-2135].codfw.wmnet pooled in wikikube-codfw
Clement_Goubert claimed this task.

Nodes imaged and pooled, resolving. Decom of refreshed hosts will be tracked in T379788