Page MenuHomePhabricator

Migrate codfw Ganeti cluster to KVM machine type pc-i440fx-2.8
Closed, ResolvedPublic

Description

As a preparation for the Buster update we need to switch the KVM machine type for the Ganeti cluster to pc-i440fx-2.8 (the hardware provided by qemu 2.8). Otherwise we wouldn't be able to migrate machines between Stretch and Buster nodes (which would default to pc-i440fx-3.1). One the Buster migration is complete we switch back to pc-i440fx-3.1.

This requires the following steps:

  1. sudo gnt-cluster modify --hypervisor-parameters kvm:machine_version=pc-i440fx-2.8
  2. Restart all instances of the cluster. A reboot from within the OS isn't sufficient, this needs to be rebooted on the Ganeti level so that the KVM instance gets restarted (kind of comparable to resetting a computer with the power button). There's a new cookbook for this: sre.ganeti.reboot-vm
  • acmechief2001.codfw.wmnet
  • acmechief-test2001.codfw.wmnet
  • apt2001.wikimedia.org
  • chartmuseum2001.codfw.wmnet
  • debmonitor2002.codfw.wmnet
  • deneb.codfw.wmnet
  • doc2001.codfw.wmnet
  • doh2001.wikimedia.org
  • doh2002.wikimedia.org
  • dragonfly-supernode2001.codfw.wmnet
  • durum2001.codfw.wmnet
  • durum2002.codfw.wmnet
  • failoid2002.codfw.wmnet
  • gitlab2001.wikimedia.org
  • grafana2001.codfw.wmnet
  • idp2001.wikimedia.org
  • idp-test2001.wikimedia.org
  • install2003.wikimedia.org
  • irc2001.wikimedia.org
  • kafkamon2002.codfw.wmnet
  • kubemaster2001.codfw.wmnet
  • kubemaster2002.codfw.wmnet
  • kubernetes2005.codfw.wmnet
  • kubernetes2006.codfw.wmnet
  • kubernetes2015.codfw.wmnet
  • kubernetes2016.codfw.wmnet
  • kubestagemaster2001.codfw.wmnet
  • kubestagetcd2001.codfw.wmnet
  • kubestagetcd2002.codfw.wmnet
  • kubestagetcd2003.codfw.wmnet
  • kubetcd2004.codfw.wmnet
  • kubetcd2005.codfw.wmnet
  • kubetcd2006.codfw.wmnet
  • ldap-corp2001.wikimedia.org
  • ldap-replica2005.wikimedia.org
  • ldap-replica2006.wikimedia.org
  • logstash2004.codfw.wmnet
  • logstash2005.codfw.wmnet
  • logstash2006.codfw.wmnet
  • logstash2023.codfw.wmnet
  • logstash2024.codfw.wmnet
  • logstash2025.codfw.wmnet
  • logstash2030.codfw.wmnet
  • logstash2031.codfw.wmnet
  • miscweb2002.codfw.wmnet
  • ml-etcd2001.codfw.wmnet
  • ml-etcd2002.codfw.wmnet
  • ml-etcd2003.codfw.wmnet
  • ml-serve-ctrl2001.codfw.wmnet
  • ml-serve-ctrl2002.codfw.wmnet
  • mwdebug2001.codfw.wmnet
  • mwdebug2002.codfw.wmnet
  • mx2001.wikimedia.org
  • ncredir2001.codfw.wmnet
  • ncredir2002.codfw.wmnet
  • netbox2001.wikimedia.org
  • netbox-dev2001.wikimedia.org
  • netboxdb2001.codfw.wmnet
  • netflow2001.codfw.wmnet
  • orespoolcounter2003.codfw.wmnet
  • orespoolcounter2004.codfw.wmnet
  • people2002.codfw.wmnet
  • ping2001.codfw.wmnet
  • planet2002.codfw.wmnet
  • poolcounter2003.codfw.wmnet
  • poolcounter2004.codfw.wmnet
  • puppetboard2001.codfw.wmnet
  • puppetboard2002.codfw.wmnet
  • puppetdb2002.codfw.wmnet
  • pybal-test2001.codfw.wmnet
  • pybal-test2002.codfw.wmnet
  • pybal-test2003.codfw.wmnet
  • registry2003.codfw.wmnet
  • registry2004.codfw.wmnet
  • releases2002.codfw.wmnet
  • rpki2002.codfw.wmnet
  • schema2003.codfw.wmnet
  • schema2004.codfw.wmnet
  • search-loader2001.codfw.wmnet
  • serpens.wikimedia.org
  • urldownloader2001.wikimedia.org
  • urldownloader2002.wikimedia.org
  • webperf2001.codfw.wmnet
  • webperf2002.codfw.wmnet
  • xhgui2001.codfw.wmnet

Event Timeline

Mentioned in SAL (#wikimedia-operations) [2021-11-22T15:17:19Z] <moritzm> set kvm:machine_version=pc-i440fx-2.8 for Ganeti cluster in codfw T294119

Change 740864 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):

[operations/dns@master] Point irc.wikimedia.org to irc1001

https://gerrit.wikimedia.org/r/740864

Mentioned in SAL (#wikimedia-operations) [2021-11-24T10:25:15Z] <XioNoX> disable ping-offload for codfw - T294119

Mentioned in SAL (#wikimedia-operations) [2021-11-24T10:48:02Z] <XioNoX> rollback: disable ping-offload for codfw - T294119

Change 740864 merged by Muehlenhoff:

[operations/dns@master] Point irc.wikimedia.org to irc1001

https://gerrit.wikimedia.org/r/740864

VM acmechief2001.codfw.wmnet rebooted by vgutierrez@cumin1001 with reason: None

VM acmechief-test2001.codfw.wmnet rebooted by vgutierrez@cumin1001 with reason: None

VM ncredir2001.codfw.wmnet rebooted by vgutierrez@cumin1001 with reason: None

VM ncredir2001.codfw.wmnet rebooted by vgutierrez@cumin1001 with reason: None

VM ncredir2002.codfw.wmnet rebooted by vgutierrez@cumin1001 with reason: None

All VMs have been restarted to enable the machine type.