Page MenuHomePhabricator

Re-purpose kafka-stretch200[1-2] as DSE workers in codfw
Closed, ResolvedPublic

Description

This is only a proposal at the moment, pending completion and review of the DPE compute and storage strategy.

Event Timeline

Gehel moved this task from Incoming to Hardware refresh on the Data-Platform-SRE board.
BTullis raised the priority of this task from Low to Medium.

Change #1160888 had a related patch set uploaded (by Btullis; author: Btullis):

[operations/puppet@production] Prepare for renaming kafka-stretc200[1-2] to dse-k8s-worker200[1-2]

https://gerrit.wikimedia.org/r/1160888

Change #1160888 merged by Btullis:

[operations/puppet@production] Prepare for renaming kafka-stretch200[1-2] to dse-k8s-worker200[1-2]

https://gerrit.wikimedia.org/r/1160888

Cookbook cookbooks.sre.hosts.rename started by btullis@cumin1003 from kafka-stretch2001 to dse-k8s-worker2001 completed:

  • kafka-stretch2001 (PASS)
    • ✔️ Downtimed host on Icinga/Alertmanager
    • ✔️ Disabled puppet and its timer
    • ✔️ Disabled debmonitor-client timer
    • ✔️ Netbox updated
    • ✔️ BMC Hostname updated
    • ✔️ DNS updated
    • ✔️ Switch description updated
    • ✔️ Removed from DebMonitor
    • ✔️ Removed from Puppet master and PuppetDB
    • Rename completed 👍 - now please run the re-image cookbook on the new name with --new

Cookbook cookbooks.sre.hosts.rename started by btullis@cumin1003 from kafka-stretch2002 to dse-k8s-worker2002 completed:

  • kafka-stretch2002 (PASS)
    • ✔️ Downtimed host on Icinga/Alertmanager
    • ✔️ Disabled puppet and its timer
    • ✔️ Disabled debmonitor-client timer
    • ✔️ Netbox updated
    • ✔️ BMC Hostname updated
    • ✔️ DNS updated
    • ✔️ Switch description updated
    • ✔️ Removed from DebMonitor
    • ✔️ Removed from Puppet master and PuppetDB
    • Rename completed 👍 - now please run the re-image cookbook on the new name with --new

Cookbook cookbooks.sre.hosts.reimage was started by btullis@cumin1003 for host dse-k8s-worker2001.codfw.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage started by btullis@cumin1003 for host dse-k8s-worker2001.codfw.wmnet with OS bookworm completed:

  • dse-k8s-worker2001 (PASS)
    • Host successfully migrated to the new VLAN
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata (7) to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202506191002_btullis_2242387_dse-k8s-worker2001.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage was started by btullis@cumin1003 for host dse-k8s-worker2002.codfw.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage started by btullis@cumin1003 for host dse-k8s-worker2002.codfw.wmnet with OS bookworm completed:

  • dse-k8s-worker2002 (PASS)
    • Host successfully migrated to the new VLAN
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata (7) to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202506191046_btullis_2248496_dse-k8s-worker2002.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB