Page MenuHomePhabricator

setup/install kubernetes10[59-62]
Closed, ResolvedPublic

Description

kubernetes10[59-62].eqiad.wmnet have been delivered by DC-Ops and need to be setup/added do the cluster.

https://wikitech.wikimedia.org/wiki/Kubernetes/Clusters/Add_or_remove_nodes

Related Objects

StatusSubtypeAssignedTask
ResolvedJclark-ctr
ResolvedClement_Goubert

Event Timeline

Change 982051 had a related patch set uploaded (by Clément Goubert; author: Clément Goubert):

[operations/homer/public@master] kubernetes10[59-62]: add to eqiad.k8s

https://gerrit.wikimedia.org/r/982051

Change 982071 had a related patch set uploaded (by Clément Goubert; author: Clément Goubert):

[operations/puppet@production] wikikube: put kubernetes10[59-62] in production

https://gerrit.wikimedia.org/r/982071

Change 982072 had a related patch set uploaded (by Clément Goubert; author: Clément Goubert):

[operations/puppet@production] wikikube: add kubernetes10[59-62] to LVS

https://gerrit.wikimedia.org/r/982072

Change 982071 merged by Clément Goubert:

[operations/puppet@production] wikikube: put kubernetes10[59-62] in production

https://gerrit.wikimedia.org/r/982071

Change 982051 merged by jenkins-bot:

[operations/homer/public@master] kubernetes10[59-62]: add to devices.yaml

https://gerrit.wikimedia.org/r/982051

Mentioned in SAL (#wikimedia-operations) [2023-12-11T15:55:50Z] <claime> homer lsw1-*eqiad* commit "Put kubernetes10[59-62] in production - T353135"

Cookbook cookbooks.sre.hosts.reimage was started by cgoubert@cumin1001 for host kubernetes1059.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage was started by cgoubert@cumin1001 for host kubernetes1060.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage was started by cgoubert@cumin1001 for host kubernetes1061.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage was started by cgoubert@cumin1001 for host kubernetes1062.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by cgoubert@cumin1001 for host kubernetes1059.eqiad.wmnet with OS bullseye completed:

  • kubernetes1059 (WARN)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202312111619_cgoubert_1103752_kubernetes1059.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB
    • Cleared switch DHCP cache and MAC table for the host IP and MAC (EVPN Switch)

Cookbook cookbooks.sre.hosts.reimage started by cgoubert@cumin1001 for host kubernetes1062.eqiad.wmnet with OS bullseye completed:

  • kubernetes1062 (WARN)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202312111622_cgoubert_1104845_kubernetes1062.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB
    • Cleared switch DHCP cache and MAC table for the host IP and MAC (EVPN Switch)

Cookbook cookbooks.sre.hosts.reimage started by cgoubert@cumin1001 for host kubernetes1061.eqiad.wmnet with OS bullseye completed:

  • kubernetes1061 (WARN)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202312111625_cgoubert_1104594_kubernetes1061.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB
    • Cleared switch DHCP cache and MAC table for the host IP and MAC (EVPN Switch)

Cookbook cookbooks.sre.hosts.reimage started by cgoubert@cumin1001 for host kubernetes1060.eqiad.wmnet with OS bullseye completed:

  • kubernetes1060 (WARN)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Unable to downtime the new host on Icinga/Alertmanager, the sre.hosts.downtime cookbook returned 99
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202312111627_cgoubert_1104301_kubernetes1060.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB
    • Cleared switch DHCP cache and MAC table for the host IP and MAC (EVPN Switch)

Change 982072 merged by Clément Goubert:

[operations/puppet@production] wikikube: add kubernetes10[59-62] to LVS

https://gerrit.wikimedia.org/r/982072

Mentioned in SAL (#wikimedia-operations) [2023-12-12T12:37:18Z] <claime> Pooling kubernetes10[59-62].eqiad.wmnet - T353135

Mentioned in SAL (#wikimedia-operations) [2023-12-12T12:38:53Z] <claime> Uncordoning kubernetes10[59-62].eqiad.wmnet - T353135

Nodes are in production.