Page MenuHomePhabricator

Create the dse-k8s-ctrl servers in codfw
Closed, ResolvedPublic

Description

We are building a dse-k8s-codfw cluster, so we will need two servers to act as the API servers.

They will use the same template as our dse-k8s-eqiad cluster, so it will comprise 2 Ganeti VMs.

Follow the guidelines here to commission tham and set them up: https://wikitech.wikimedia.org/wiki/Kubernetes/Clusters/New#Control-plane

Event Timeline

Change #1167209 had a related patch set uploaded (by Btullis; author: Btullis):

[operations/puppet@production] Add the new dse-k8s hosts to site.pp so that we can create the VMs

https://gerrit.wikimedia.org/r/1167209

Change #1167209 merged by Btullis:

[operations/puppet@production] Add the new dse-k8s hosts to site.pp so that we can create the VMs

https://gerrit.wikimedia.org/r/1167209

Cookbook cookbooks.sre.hosts.reimage was started by btullis@cumin1003 for host dse-k8s-ctrl2001.codfw.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage was started by btullis@cumin1003 for host dse-k8s-ctrl2002.codfw.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage started by btullis@cumin1003 for host dse-k8s-ctrl2001.codfw.wmnet with OS bookworm completed:

  • dse-k8s-ctrl2001 (PASS)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via gnt-instance
    • Host up (Debian installer)
    • Add puppet_version metadata (7) to Debian installer
    • Set boot media to disk
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202507081739_btullis_848937_dse-k8s-ctrl2001.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage started by btullis@cumin1003 for host dse-k8s-ctrl2002.codfw.wmnet with OS bookworm completed:

  • dse-k8s-ctrl2002 (PASS)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via gnt-instance
    • Host up (Debian installer)
    • Add puppet_version metadata (7) to Debian installer
    • Set boot media to disk
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202507081748_btullis_848966_dse-k8s-ctrl2002.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
BTullis claimed this task.