Page MenuHomePhabricator

Migrate eqiad1 hypervisors to Neutron OVS agent
Closed, ResolvedPublic

Description

  • cloudvirt1031.eqiad.wmnet
  • cloudvirt1032.eqiad.wmnet
  • cloudvirt1033.eqiad.wmnet
  • cloudvirt1034.eqiad.wmnet
  • cloudvirt1035.eqiad.wmnet
  • cloudvirt1036.eqiad.wmnet
  • cloudvirt1037.eqiad.wmnet
  • cloudvirt1038.eqiad.wmnet
  • cloudvirt1039.eqiad.wmnet
  • cloudvirt1040.eqiad.wmnet
  • cloudvirt1041.eqiad.wmnet
  • cloudvirt1042.eqiad.wmnet
  • cloudvirt1043.eqiad.wmnet
  • cloudvirt1044.eqiad.wmnet
  • cloudvirt1045.eqiad.wmnet
  • cloudvirt1046.eqiad.wmnet
  • cloudvirt1047.eqiad.wmnet
  • cloudvirt1048.eqiad.wmnet
  • cloudvirt1049.eqiad.wmnet
  • cloudvirt1050.eqiad.wmnet
  • cloudvirt1051.eqiad.wmnet
  • cloudvirt1052.eqiad.wmnet
  • cloudvirt1053.eqiad.wmnet
  • cloudvirt1054.eqiad.wmnet
  • cloudvirt1055.eqiad.wmnet
  • cloudvirt1056.eqiad.wmnet
  • cloudvirt1057.eqiad.wmnet
  • cloudvirt1058.eqiad.wmnet
  • cloudvirt1059.eqiad.wmnet
  • cloudvirt1060.eqiad.wmnet
  • cloudvirt1061.eqiad.wmnet
  • cloudvirt1062.eqiad.wmnet
  • cloudvirt1063.eqiad.wmnet
  • cloudvirt1064.eqiad.wmnet
  • cloudvirt1065.eqiad.wmnet
  • cloudvirt1066.eqiad.wmnet
  • cloudvirt1067.eqiad.wmnet
  • cloudvirtlocal1001.eqiad.wmnet
  • cloudvirtlocal1002.eqiad.wmnet
  • cloudvirtlocal1003.eqiad.wmnet
  • cloudvirt-wdqs1001.eqiad.wmnet
  • cloudvirt-wdqs1002.eqiad.wmnet
  • cloudvirt-wdqs1003.eqiad.wmnet

For each hypervisor:

  1. Prevent draining from moving VMs to OVS hosts:
you@cloudcontrol1007 $ sudo wmcs-openstack server list --host cloudvirtXXXX --all -f json | jq -r '.[]|select(.Name | contains("canaryXXXX") | not).ID' | xargs -L1 sudo python3 /root/backfill-extra-specs.py --instance-uuid
  1. Drain with the cookbook
  2. Stop the canary VM
  3. Set profile::openstack::eqiad1::neutron::use_ovs: true in Hiera and reimage host
  4. Delete the old network agent with wmcs-openstack network agent delete
  5. Recreate canary VM
  6. Add to ceph and network-ovs aggregates, remove from 'maintenance'

Details

Other Assignee
taavi
SubjectRepoBranchLines +/-
operations/puppetproduction+1 -0
operations/puppetproduction+2 -0
operations/puppetproduction+2 -0
operations/puppetproduction+4 -0
operations/puppetproduction+11 -0
operations/puppetproduction+2 -0
operations/puppetproduction+2 -0
operations/puppetproduction+2 -0
operations/puppetproduction+2 -0
operations/puppetproduction+2 -0
operations/puppetproduction+2 -0
operations/puppetproduction+2 -0
operations/puppetproduction+2 -0
operations/puppetproduction+2 -0
operations/puppetproduction+2 -0
operations/puppetproduction+2 -0
operations/puppetproduction+2 -0
operations/puppetproduction+1 -0
operations/puppetproduction+2 -0
operations/puppetproduction+2 -0
operations/puppetproduction+2 -0
cloud/wmcs-cookbooksmain+12 -0
operations/puppetproduction+1 -0
operations/puppetproduction+1 -0
operations/puppetproduction+1 -5
operations/puppetproduction+3 -0
operations/puppetproduction+2 -0
operations/puppetproduction+4 -0
operations/puppetproduction+2 -0
operations/puppetproduction+1 -0
operations/puppetproduction+1 -0
operations/puppetproduction+1 -0
cloud/wmcs-cookbooksmain+195 -2
operations/puppetproduction+1 -0
operations/puppetproduction+1 -0
operations/puppetproduction+1 -5
operations/puppetproduction+1 -1
Show related patches Customize query in gerrit

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Change #1048029 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):

[operations/puppet@production] hieradata: Move cloudvirt1052 to OVS

https://gerrit.wikimedia.org/r/1048029

Change #1048029 merged by Andrew Bogott:

[operations/puppet@production] hieradata: Move cloudvirt1052 to OVS

https://gerrit.wikimedia.org/r/1048029

Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumin1002 for host cloudvirt1052.eqiad.wmnet with OS bookworm

Change #1048030 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):

[operations/puppet@production] hieradata: Move cloudvirt1063 to OVS

https://gerrit.wikimedia.org/r/1048030

Change #1048030 merged by Andrew Bogott:

[operations/puppet@production] hieradata: Move cloudvirt1063 to OVS

https://gerrit.wikimedia.org/r/1048030

Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin1002 for host cloudvirt1052.eqiad.wmnet with OS bookworm completed:

  • cloudvirt1052 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202406201606_andrew_50845_cloudvirt1052.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumin1002 for host cloudvirt1063.eqiad.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin1002 for host cloudvirt1063.eqiad.wmnet with OS bookworm completed:

  • cloudvirt1063 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202406201651_andrew_60094_cloudvirt1063.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status failed -> active
    • The sre.puppet.sync-netbox-hiera cookbook was run successfully

Change #1048041 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):

[operations/puppet@production] hieradata: Move cloudvirt1053 to OVS

https://gerrit.wikimedia.org/r/1048041

Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumin1002 for host cloudvirt1053.eqiad.wmnet with OS bookworm

Change #1048041 merged by Andrew Bogott:

[operations/puppet@production] hieradata: Move cloudvirt1053 to OVS

https://gerrit.wikimedia.org/r/1048041

Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin1002 for host cloudvirt1053.eqiad.wmnet with OS bookworm completed:

  • cloudvirt1053 (WARN)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202406201751_andrew_68886_cloudvirt1053.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumin1002 for host cloudvirt1054.eqiad.wmnet with OS bookworm

Change #1049231 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):

[operations/puppet@production] hieradata: Move cloudvirt1053 to OVS

https://gerrit.wikimedia.org/r/1049231

Change #1049231 merged by Andrew Bogott:

[operations/puppet@production] hieradata: Move cloudvirt1053 to OVS

https://gerrit.wikimedia.org/r/1049231

Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin1002 for host cloudvirt1054.eqiad.wmnet with OS bookworm completed:

  • cloudvirt1054 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202406241644_andrew_818954_cloudvirt1054.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Change #1049243 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):

[operations/puppet@production] hieradata: Move cloudvirt1055 to OVS

https://gerrit.wikimedia.org/r/1049243

Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumin1002 for host cloudvirt1055.eqiad.wmnet with OS bookworm

Change #1049243 merged by Andrew Bogott:

[operations/puppet@production] hieradata: Move cloudvirt1055 to OVS

https://gerrit.wikimedia.org/r/1049243

Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin1002 for host cloudvirt1055.eqiad.wmnet with OS bookworm completed:

  • cloudvirt1055 (WARN)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202406241747_andrew_829021_cloudvirt1055.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB

Change #1049255 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):

[operations/puppet@production] hieradata: Move cloudvirt1056 to OVS

https://gerrit.wikimedia.org/r/1049255

Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumin1002 for host cloudvirt1056.eqiad.wmnet with OS bookworm

Change #1049255 merged by Andrew Bogott:

[operations/puppet@production] hieradata: Move cloudvirt1056 to OVS

https://gerrit.wikimedia.org/r/1049255

Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin1002 for host cloudvirt1056.eqiad.wmnet with OS bookworm completed:

  • cloudvirt1056 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202406241935_andrew_844926_cloudvirt1056.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumin1002 for host cloudvirt2004-dev.codfw.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin1002 for host cloudvirt2004-dev.codfw.wmnet with OS bookworm completed:

  • cloudvirt2004-dev (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202406251704_andrew_1009777_cloudvirt2004-dev.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Change #1049614 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):

[operations/puppet@production] hieradata: Move cloudvirt2004-dev to OVS

https://gerrit.wikimedia.org/r/1049614

Change #1049615 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):

[operations/puppet@production] hieradata: Move cloudvirt2005-dev to OVS

https://gerrit.wikimedia.org/r/1049615

Change #1049616 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):

[operations/puppet@production] hieradata: Move cloudvirt2006-dev to OVS

https://gerrit.wikimedia.org/r/1049616

Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumin1002 for host cloudvirt2004-dev.codfw.wmnet with OS bookworm

Change #1049614 merged by Andrew Bogott:

[operations/puppet@production] hieradata: Move cloudvirt2004-dev to OVS

https://gerrit.wikimedia.org/r/1049614

Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin1002 for host cloudvirt2004-dev.codfw.wmnet with OS bookworm completed:

  • cloudvirt2004-dev (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202406251757_andrew_1019202_cloudvirt2004-dev.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumin1002 for host cloudvirt2005-dev.codfw.wmnet with OS bookworm

Change #1049615 merged by Andrew Bogott:

[operations/puppet@production] hieradata: Move cloudvirt2005-dev to OVS

https://gerrit.wikimedia.org/r/1049615

Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin1002 for host cloudvirt2005-dev.codfw.wmnet with OS bookworm completed:

  • cloudvirt2005-dev (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202406252209_andrew_1057565_cloudvirt2005-dev.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumin1002 for host cloudvirt2006-dev.codfw.wmnet with OS bookworm

Change #1049616 merged by Andrew Bogott:

[operations/puppet@production] hieradata: Move cloudvirt2006-dev to OVS

https://gerrit.wikimedia.org/r/1049616

Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin1002 for host cloudvirt2006-dev.codfw.wmnet with OS bookworm completed:

  • cloudvirt2006-dev (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202406252305_andrew_1066233_cloudvirt2006-dev.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Change #1050356 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):

[operations/puppet@production] Move cloudvirtlocal1001 to ovs

https://gerrit.wikimedia.org/r/1050356

Change #1050357 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):

[operations/puppet@production] Move cloudvirtlocal1002 to ovs

https://gerrit.wikimedia.org/r/1050357

Change #1050358 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):

[operations/puppet@production] Move cloudvirtlocal1003 to ovs

https://gerrit.wikimedia.org/r/1050358

Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumin1002 for host cloudvirtlocal1001.eqiad.wmnet with OS bookworm

Change #1050356 merged by Andrew Bogott:

[operations/puppet@production] Move cloudvirtlocal1001 to ovs

https://gerrit.wikimedia.org/r/1050356

Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin1002 for host cloudvirtlocal1001.eqiad.wmnet with OS bookworm completed:

  • cloudvirtlocal1001 (WARN)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202406271312_andrew_1368297_cloudvirtlocal1001.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumin1002 for host cloudvirtlocal1002.eqiad.wmnet with OS bookworm

Change #1050357 merged by Andrew Bogott:

[operations/puppet@production] Move cloudvirtlocal1002 to ovs

https://gerrit.wikimedia.org/r/1050357

Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin1002 for host cloudvirtlocal1002.eqiad.wmnet with OS bookworm completed:

  • cloudvirtlocal1002 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202406271601_andrew_1394531_cloudvirtlocal1002.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Change #1050358 merged by Andrew Bogott:

[operations/puppet@production] Move cloudvirtlocal1003 to ovs

https://gerrit.wikimedia.org/r/1050358

Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumin1002 for host cloudvirtlocal1003.eqiad.wmnet with OS bookworm

Change #1050430 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):

[operations/puppet@production] Move 5 cloudvirts to ovs

https://gerrit.wikimedia.org/r/1050430

Change #1050430 merged by Andrew Bogott:

[operations/puppet@production] Move 5 cloudvirts to ovs

https://gerrit.wikimedia.org/r/1050430

Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin1002 for host cloudvirtlocal1003.eqiad.wmnet with OS bookworm completed:

  • cloudvirtlocal1003 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202406271709_andrew_1416774_cloudvirtlocal1003.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Change #1050444 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):

[operations/puppet@production] Two more cloudvirts to ovs

https://gerrit.wikimedia.org/r/1050444

Change #1050444 merged by Andrew Bogott:

[operations/puppet@production] Two more cloudvirts to ovs

https://gerrit.wikimedia.org/r/1050444

fnegri changed the task status from Open to In Progress.Jun 28 2024, 12:53 PM
fnegri reassigned this task from taavi to Andrew.
fnegri updated Other Assignee, added: taavi; removed: Andrew.

Change #1053404 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):

[operations/puppet@production] cloudvirt1060 -> OVS

https://gerrit.wikimedia.org/r/1053404

Change #1053404 merged by Andrew Bogott:

[operations/puppet@production] cloudvirt1060 -> OVS

https://gerrit.wikimedia.org/r/1053404

Change #1055486 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):

[operations/puppet@production] cloudvirt1061 -> ovs

https://gerrit.wikimedia.org/r/1055486

Change #1055486 merged by Andrew Bogott:

[operations/puppet@production] cloudvirt1061 -> ovs

https://gerrit.wikimedia.org/r/1055486

Andrew triaged this task as High priority.Aug 14 2024, 1:34 PM

Mentioned in SAL (#wikimedia-cloud) [2024-09-23T14:42:15Z] <arturo> put cloudvirt1048 in the network-ovs aggregate T364457

Change #1092412 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):

[operations/puppet@production] cloudvirt1062 -> ovs

https://gerrit.wikimedia.org/r/1092412

Change #1092412 merged by Andrew Bogott:

[operations/puppet@production] cloudvirt1062 -> ovs

https://gerrit.wikimedia.org/r/1092412

Finally moved the last one of these, cloudvirt1062