Page MenuHomePhabricator

Setup kubernetes20[25-53]
Closed, ResolvedPublic

Description

The new nods kubernetes20[25-53] have been handed over by DCops and should be added to the wikikube cluster

https://wikitech.wikimedia.org/wiki/Kubernetes/Clusters/Add_or_remove_nodes

Related Objects

StatusSubtypeAssignedTask
ResolvedPapaul
ResolvedJoe

Event Timeline

JMeybohm triaged this task as Medium priority.Sep 6 2023, 9:39 AM
JMeybohm moved this task from Incoming 🐫 to ⎈Kubernetes on the serviceops board.

Change 958487 had a related patch set uploaded (by Giuseppe Lavagetto; author: Giuseppe Lavagetto):

[operations/puppet@production] wikikube: put the new codfw nodes in production

https://gerrit.wikimedia.org/r/958487

Change 958488 had a related patch set uploaded (by Giuseppe Lavagetto; author: Giuseppe Lavagetto):

[operations/puppet@production] conftool: add new k8s nodes

https://gerrit.wikimedia.org/r/958488

Change 958489 had a related patch set uploaded (by Giuseppe Lavagetto; author: Giuseppe Lavagetto):

[operations/homer/public@master] Add configuration for the new kubernetes node in codfw

https://gerrit.wikimedia.org/r/958489

Change 958489 merged by jenkins-bot:

[operations/homer/public@master] Add configuration for the new kubernetes node in codfw

https://gerrit.wikimedia.org/r/958489

Change 958487 merged by Effie Mouzeli:

[operations/puppet@production] wikikube: put the new codfw nodes in production

https://gerrit.wikimedia.org/r/958487

Cookbook cookbooks.sre.hosts.reimage was started by jiji@cumin1001 for host kubernetes2028.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by jiji@cumin1001 for host kubernetes2028.codfw.wmnet with OS bullseye completed:

  • kubernetes2028 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202309211245_jiji_386204_kubernetes2028.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Change 958488 merged by Effie Mouzeli:

[operations/puppet@production] conftool: add new k8s nodes

https://gerrit.wikimedia.org/r/958488

JMeybohm closed this task as Resolved.EditedSep 25 2023, 11:58 AM

This is done since Thursday