Page MenuHomePhabricator

Site: codfw 1 VM request for staging-codfw kube-apiserver
Closed, ResolvedPublic

Description

Cloud VPS Project Tested:
Site/Location: codfw
Number of systems: 1
Service: kube-apiserver and etcd
Networking Requirements: internal
Processor Requirements: 4
Memory: 5GB
Disks: 30GB
Other Requirements: No DRBD

Event Timeline

Looks good. We can't disable DRBD on instance creation currently, simply add it as usual and then you can use the sre.ganeti.changedisk cookbook to switch to plain disks.

Change #1024543 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[operations/puppet@production] kubestagemaster2003: Add as insetup::serviceops

https://gerrit.wikimedia.org/r/1024543

Change #1024543 merged by JMeybohm:

[operations/puppet@production] kubestagemaster2003: Add as insetup::serviceops

https://gerrit.wikimedia.org/r/1024543

Cookbook cookbooks.sre.hosts.reimage was started by jayme@cumin1002 for host kubestagemaster2003.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by jayme@cumin1002 for host kubestagemaster2003.codfw.wmnet with OS bullseye completed:

  • kubestagemaster2003 (PASS)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via gnt-instance
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Set boot media to disk
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202404260748_jayme_2095943_kubestagemaster2003.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

cookbooks.sre.hosts.decommission executed by jayme@cumin1002 for hosts: kubestagemaster2003.codfw.wmnet

  • kubestagemaster2003.codfw.wmnet (WARN)
    • Host not found on Icinga, unable to downtime it
    • Found Ganeti VM
    • VM shutdown
    • Started forced sync of VMs in Ganeti cluster codfw_test to Netbox
    • Removed from DebMonitor
    • Removed from Puppet master and PuppetDB
    • VM removed
    • Started forced sync of VMs in Ganeti cluster codfw_test to Netbox

Cookbook cookbooks.sre.hosts.reimage was started by jayme@cumin1002 for host kubestagemaster2003.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by jayme@cumin1002 for host kubestagemaster2003.codfw.wmnet with OS bullseye completed:

  • kubestagemaster2003 (PASS)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via gnt-instance
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Set boot media to disk
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202404261135_jayme_2341055_kubestagemaster2003.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Cleared switch DHCP cache and MAC table for the host IP and MAC (EVPN Switch)

cookbooks.sre.hosts.decommission executed by jayme@cumin1002 for hosts: kubestagemaster2003.codfw.wmnet

  • kubestagemaster2003.codfw.wmnet (PASS)
    • Downtimed host on Icinga/Alertmanager
    • Found Ganeti VM
    • VM shutdown
    • Started forced sync of VMs in Ganeti cluster codfw_test to Netbox
    • Removed from DebMonitor
    • Removed from Puppet master and PuppetDB
    • VM removed
    • Started forced sync of VMs in Ganeti cluster codfw_test to Netbox

Cookbook cookbooks.sre.hosts.reimage was started by jayme@cumin1002 for host kubestagemaster2003.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by jayme@cumin1002 for host kubestagemaster2003.codfw.wmnet with OS bullseye completed:

  • kubestagemaster2003 (PASS)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via gnt-instance
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Set boot media to disk
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202404290840_jayme_2997667_kubestagemaster2003.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Change #1030955 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[operations/puppet@production] Add kubestagemaster200[45] as insetup::serviceops

https://gerrit.wikimedia.org/r/1030955

Cookbook cookbooks.sre.hosts.reimage was started by jayme@cumin1002 for host kubestagemaster2004.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by jayme@cumin1002 for host kubestagemaster2004.codfw.wmnet with OS bullseye completed:

  • kubestagemaster2004 (PASS)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via gnt-instance
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Set boot media to disk
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202405131435_jayme_4048471_kubestagemaster2004.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB