Page MenuHomePhabricator

wikikube-ctrl2006 implementation tracking
Open, HighPublic

Description

This task is to track the service implementation of ServiceOps new host(s) listed in the task description.

Once the linked racking task has been resolved, this task can be implemented.

This sub-task creation/update is per the request of ServiceOps new; this task is assigned at creation to the 'Sub-team Technical Contact' provided in the initial ordering task.

Follow https://wikitech.wikimedia.org/wiki/Kubernetes/Clusters/Add_or_remove_control-planes#Add_stacked_control-plane then decom wikikube-ctrl2003

Event Timeline

Change #1249321 had a related patch set uploaded (by Jasmine; author: Jasmine):

[operations/puppet@production] wikikube: add wikikube-ctrl2006

https://gerrit.wikimedia.org/r/1249321

Change #1249423 had a related patch set uploaded (by Jasmine; author: Jasmine):

[operations/dns@master] wmnet: add wikikube-ctrl2006 to etcd-server SRV record

https://gerrit.wikimedia.org/r/1249423

Cookbook cookbooks.sre.hosts.reimage was started by jasmine@cumin2002 for host wikikube-ctrl2006.codfw.wmnet with OS trixie

Cookbook cookbooks.sre.hosts.reimage started by jasmine@cumin2002 for host wikikube-ctrl2006.codfw.wmnet with OS trixie executed with errors:

  • wikikube-ctrl2006 (FAIL)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced UEFI HTTP Boot for next reboot
    • Host rebooted via Redfish
    • The reimage failed, see the cookbook logs for the details. You can also try typing "sudo install-console wikikube-ctrl2006.codfw.wmnet" to get a root shell, but depending on the failure this may not work.

For visibility, I've intentionally aborted the reimage due to the following [0] which I will investigate & re-attempt.

==> Unable to verify that the host is inside the Debian installer, please verify manually with: sudo install-console wikikube-ctrl2006.codfw.wmnet