Page MenuHomePhabricator

Migrating eqsin to routed Ganeti
Closed, ResolvedPublic

Description

An intro on routed Ganeti can be found here: https://phabricator.wikimedia.org/phame/post/view/312/ganeti_on_modern_network_design/

Routed ganeti is already running in magru, ulsfo and esams. The upcoming switch replacement at eqsin will also require to migrate the Ganeti servers in eqsin to routed Ganeti.

eqsin is currently in the old edge design for classic Ganeti; it spans a four node Ganeti cluster. Compared to the migrations in magru and esams this simplifies the migration a bit since we don't need to switch to single node clusters with limited redundancy.

All VMs will need to be rebuilt on the new cluster.

row 1: ganeti5004, ganeti5005, ganeti5006, ganeti5007

List of VMs:

  • atlas5001.wikimedia.org (reinstalled as atlas5001)
  • bast5004.wikimedia.org (replaced by bast5005)
  • doh5001.wikimedia.org (replaced by doh5003)
  • doh5002.wikimedia.org (replaced by doh5004)
  • durum5001.eqsin.wmnet (replaced by durum5003)
  • durum5002.eqsin.wmnet (replaced by durum5004)
  • hcaptcha-proxy5001.wikimedia.org (replaced by hcaptcha-proxy5003)
  • hcaptcha-proxy5002.wikimedia.org (replaced by hcaptcha-proxy5004)
  • install5003.wikimedia.org (replaced by install5004)
  • ncredir5001.eqsin.wmnet (replaced by ncredir5003)
  • ncredir5002.eqsin.wmnet (replaced by ncredir5004)
  • netflow5002.eqsin.wmnet (replaced by netflow5003)
  • prometheus5002.eqsin.wmnet (replaced by prometheus5003)
  • tcp-proxy5001.eqsin.wmnet (replaced by tcp-proxy5003)
  • tcp-proxy5002.eqsin.wmnet (replaced by tcp-proxy5004)

When the migration is completed, we'll be able to move the servers over for the switch refresh.

The migration path will look like the following:

  • Allocate IPs for eqsin routed Ganeti
  • Add ganeti "customer" to Homer with the eqsin ranges
  • Manually create the first IPs in Netbox to be able to add the DNS PTRs includes
  • Add allocated IPs to modules/network/data/data.yaml in Puppet
  • Announce that people move away from bast5004 and use a different bastion for now
  • Decom bast5004 (will be re-added later)
  • Decom atlas5001 (will be re-added later)
  • Move all VMs in ganeti5007 to ganeti5004/5005/5006
  • Reimage ganeti5007 with routed Ganeti
  • Initialise new cluster
  • Update ganeti5007 switch port to remove the trunked public VLAN
  • Move all VMs in ganeti5004 to ganeti5006/ganeti5007
  • Move all VMs in ganeti5005 to ganeti5006/ganeti5007
  • Reimage ganeti5004 with routed Ganeti
  • Update ganeti5004 switch port to remove the trunked public VLAN
  • Setup routing between ganeti5004 and the core routers
  • Reimage ganeti5005 with routed Ganeti
  • Update ganeti5005 switch port to remove the trunked public VLAN
  • Setup routing between ganeti5005 and the core routers
  • Create prometheus5003 on routed Ganeti with insetup role and pass on to o11y to migrate existing metrics, when done decom prometheus5002
  • Create atlas5001 on routed Ganeti and register it with RIPE
  • Create doh5003, doh5004 on routed Ganeti and fail over services
  • Create durum5003, durum5004 on routed Ganeti and fail over services
  • Decom doh5001, doh5002
  • Decom durum5001, durum5002
  • Create hcaptcha-proxy5003, hcaptcha-proxy5004 on routed Ganeti and fail over services
  • Create ncredir5003, ncredir5004 on routed Ganeti and fail over services
  • Decom hcaptcha-proxy5001, hcaptcha-proxy5002
  • Decom ncredir5001, ncredir5002
  • Create install5004 on routed Ganeti and fail over services
  • Create netflow5003 on routed Ganeti and fail over services
  • Create tcp-proxy5003, tcp-proxy5004 on routed Ganeti and fail over services
  • Update DHCP relay config on the switches to point to the new install5004
  • Point webproxy to the new install5004
  • Decom install5003
  • Decom netflow5002
  • Decom tcp-proxy5001, tcp-proxy5002
  • Create bast5005 and tell people to use it
  • Reimage ganeti5006 with routed Ganeti
  • Update ganeti5006 switch port to remove the trunked public VLAN
  • Setup routing between ganeti5006 and the core routers
  • Setup routing between ganeti5007 and the core routers
  • Remove "eqsin" from Netbox sync

Details

Related Changes in Gerrit:
SubjectRepoBranchLines +/-
operations/puppetproduction+1 -1
operations/homer/publicmaster+0 -2
operations/puppetproduction+2 -6
operations/puppetproduction+3 -11
operations/puppetproduction+0 -3
operations/dnsmaster+1 -1
operations/puppetproduction+8 -8
operations/homer/publicmaster+2 -2
operations/puppetproduction+2 -5
operations/puppetproduction+0 -8
operations/puppetproduction+0 -0
operations/puppetproduction+0 -8
operations/puppetproduction+4 -0
operations/puppetproduction+14 -12
operations/puppetproduction+1 -1
operations/puppetproduction+20 -0
operations/puppetproduction+8 -0
operations/puppetproduction+4 -0
operations/puppetproduction+5 -5
operations/puppetproduction+4 -5
operations/puppetproduction+0 -2
operations/puppetproduction+10 -0
operations/puppetproduction+0 -8
operations/puppetproduction+2 -0
operations/puppetproduction+0 -4
operations/puppetproduction+16 -18
operations/puppetproduction+0 -4
operations/puppetproduction+0 -1
operations/puppetproduction+0 -2
operations/puppetproduction+0 -2
operations/puppetproduction+8 -0
operations/homer/publicmaster+1 -1
operations/puppetproduction+0 -4
operations/puppetproduction+1 -0
operations/puppetproduction+0 -3
operations/puppetproduction+4 -0
operations/puppetproduction+2 -0
operations/puppetproduction+8 -0
operations/puppetproduction+1 -5
operations/dnsmaster+1 -9
operations/puppetproduction+8 -0
operations/puppetproduction+6 -6
operations/puppetproduction+4 -5
operations/puppetproduction+0 -4
operations/puppetproduction+18 -18
operations/puppetproduction+0 -2
operations/puppetproduction+4 -4
operations/puppetproduction+3 -0
operations/homer/publicmaster+2 -0
operations/dnsmaster+23 -1
operations/puppetproduction+5 -1
operations/puppetproduction+31 -1
operations/puppetproduction+5 -2
operations/homer/publicmaster+12 -0
Show related patches Customize query in gerrit

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

cookbooks.sre.hosts.decommission executed by jmm@cumin2002 for hosts: doh5002.wikimedia.org

  • doh5002.wikimedia.org (PASS)
    • Downtimed host on Icinga/Alertmanager
    • Found Ganeti VM
    • VM shutdown
    • Started forced sync of VMs in Ganeti cluster eqsin to Netbox
    • Removed from DebMonitor
    • Removed from Puppet server and PuppetDB
    • VM removed
    • Started forced sync of VMs in Ganeti cluster eqsin to Netbox

Mentioned in SAL (#wikimedia-operations) [2026-04-28T13:12:37Z] <moritzm> remove ganeti5005 from eqsin cluster T421863

Change #1278451 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):

[operations/puppet@production] Remove ganeti5005 from the eqsin01 cluster

https://gerrit.wikimedia.org/r/1278451

Change #1278451 merged by Muehlenhoff:

[operations/puppet@production] Remove ganeti5005 from the eqsin01 cluster

https://gerrit.wikimedia.org/r/1278451

Cookbook cookbooks.sre.hosts.reimage was started by jmm@cumin2002 for host ganeti5005.eqsin.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage started by jmm@cumin2002 for host ganeti5005.eqsin.wmnet with OS bookworm executed with errors:

  • ganeti5005 (FAIL)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Checked BIOS boot parameters are back to normal
    • The reimage failed, see the cookbook logs for the details. You can also try typing "sudo install-console ganeti5005.eqsin.wmnet" to get a root shell, but depending on the failure this may not work.

Cookbook cookbooks.sre.hosts.reimage was started by jmm@cumin2002 for host ganeti5005.eqsin.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage started by jmm@cumin2002 for host ganeti5005.eqsin.wmnet with OS bookworm completed:

  • ganeti5005 (PASS)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202604291050_jmm_246967_ganeti5005.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Change #1279275 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):

[operations/puppet@production] Add ganeti5005 to the routed Ganeti cluster in eqsin

https://gerrit.wikimedia.org/r/1279275

Change #1279275 merged by Muehlenhoff:

[operations/puppet@production] Add ganeti5005 to the routed Ganeti cluster in eqsin

https://gerrit.wikimedia.org/r/1279275

MoritzMuehlenhoff updated the task description. (Show Details)
MoritzMuehlenhoff updated the task description. (Show Details)
MoritzMuehlenhoff updated the task description. (Show Details)

Change #1279343 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):

[operations/puppet@production] Add bast5005

https://gerrit.wikimedia.org/r/1279343

Change #1279343 merged by Muehlenhoff:

[operations/puppet@production] Add bast5005

https://gerrit.wikimedia.org/r/1279343

Cookbook cookbooks.sre.hosts.reimage was started by jmm@cumin2002 for host hcaptcha-proxy5003.wikimedia.org with OS bookworm

Cookbook cookbooks.sre.hosts.reimage started by jmm@cumin2002 for host hcaptcha-proxy5003.wikimedia.org with OS bookworm completed:

  • hcaptcha-proxy5003 (PASS)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via gnt-instance
    • Host up (Debian installer)
    • Set boot media to disk
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202604300829_jmm_1129140_hcaptcha-proxy5003.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage was started by jmm@cumin2002 for host hcaptcha-proxy5004.wikimedia.org with OS bookworm

Cookbook cookbooks.sre.hosts.reimage started by jmm@cumin2002 for host hcaptcha-proxy5004.wikimedia.org with OS bookworm completed:

  • hcaptcha-proxy5004 (PASS)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via gnt-instance
    • Host up (Debian installer)
    • Set boot media to disk
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202604300954_jmm_1200460_hcaptcha-proxy5004.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage was started by jmm@cumin2002 for host bast5005.wikimedia.org with OS trixie

Change #1280353 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):

[operations/puppet@production] Assign the hcaptcha::proxy role to hcaptcha-proxy5003/5004

https://gerrit.wikimedia.org/r/1280353

Cookbook cookbooks.sre.hosts.reimage started by jmm@cumin2002 for host bast5005.wikimedia.org with OS trixie completed:

  • bast5005 (PASS)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via gnt-instance
    • Host up (Debian installer)
    • Set boot media to disk
    • Host up (new fresh trixie OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202604301145_jmm_1272606_bast5005.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Change #1280375 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):

[operations/puppet@production] Add durum5003/5004

https://gerrit.wikimedia.org/r/1280375

Change #1280375 merged by Muehlenhoff:

[operations/puppet@production] Add durum5003/5004

https://gerrit.wikimedia.org/r/1280375

Change #1282080 had a related patch set uploaded (by Ayounsi; author: Ayounsi):

[operations/puppet@production] eqsin durum hcaptcha-proxy: don't peer with core routers

https://gerrit.wikimedia.org/r/1282080

Change #1282080 merged by Ayounsi:

[operations/puppet@production] eqsin durum hcaptcha-proxy: don't peer with core routers

https://gerrit.wikimedia.org/r/1282080

Change #1282285 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):

[operations/puppet@production] Assign bastion role to bast5005

https://gerrit.wikimedia.org/r/1282285

Change #1282285 merged by Muehlenhoff:

[operations/puppet@production] Assign bastion role to bast5005

https://gerrit.wikimedia.org/r/1282285

Change #1282294 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):

[operations/puppet@production] Add bast5005 to bastion firewall service

https://gerrit.wikimedia.org/r/1282294

Cookbook cookbooks.sre.hosts.reimage was started by jmm@cumin2002 for host durum5003.eqsin.wmnet with OS bookworm

Change #1282294 merged by Muehlenhoff:

[operations/puppet@production] Add bast5005 to bastion firewall service

https://gerrit.wikimedia.org/r/1282294

Cookbook cookbooks.sre.hosts.reimage started by jmm@cumin2002 for host durum5003.eqsin.wmnet with OS bookworm completed:

  • durum5003 (WARN)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via gnt-instance
    • Host up (Debian installer)
    • Set boot media to disk
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Unable to downtime the new host on Icinga/Alertmanager, the sre.hosts.downtime cookbook returned 99
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202605041126_jmm_779169_durum5003.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage was started by jmm@cumin2002 for host durum5004.eqsin.wmnet with OS bookworm

Change #1282351 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):

[operations/puppet@production] Assign the durum role for durum5003/5004

https://gerrit.wikimedia.org/r/1282351

Cookbook cookbooks.sre.hosts.reimage started by jmm@cumin2002 for host durum5004.eqsin.wmnet with OS bookworm completed:

  • durum5004 (PASS)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via gnt-instance
    • Host up (Debian installer)
    • Set boot media to disk
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202605041250_jmm_833063_durum5004.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Change #1282358 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):

[operations/puppet@production] Add install5004

https://gerrit.wikimedia.org/r/1282358

Change #1282358 merged by Muehlenhoff:

[operations/puppet@production] Add install5004

https://gerrit.wikimedia.org/r/1282358

Change #1282351 merged by Muehlenhoff:

[operations/puppet@production] Assign the durum role for durum5003/5004

https://gerrit.wikimedia.org/r/1282351

Change #1282937 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):

[operations/puppet@production] Rename Hiera file to actually match the second durum VM in eqsin

https://gerrit.wikimedia.org/r/1282937

Change #1282937 merged by Muehlenhoff:

[operations/puppet@production] Rename Hiera file to actually match the second durum VM in eqsin

https://gerrit.wikimedia.org/r/1282937

Change #1280353 merged by Muehlenhoff:

[operations/puppet@production] Assign the hcaptcha::proxy role to hcaptcha-proxy5003/5004

https://gerrit.wikimedia.org/r/1280353

cookbooks.sre.hosts.decommission executed by jmm@cumin2002 for hosts: durum5001.eqsin.wmnet

  • durum5001.eqsin.wmnet (PASS)
    • Downtimed host on Icinga/Alertmanager
    • Found Ganeti VM
    • VM shutdown
    • Started forced sync of VMs in Ganeti cluster eqsin to Netbox
    • Removed from DebMonitor
    • Removed from Puppet server and PuppetDB
    • VM removed
    • Started forced sync of VMs in Ganeti cluster eqsin to Netbox

cookbooks.sre.hosts.decommission executed by jmm@cumin2002 for hosts: durum5002.eqsin.wmnet

  • durum5002.eqsin.wmnet (PASS)
    • Downtimed host on Icinga/Alertmanager
    • Found Ganeti VM
    • VM shutdown
    • Started forced sync of VMs in Ganeti cluster eqsin to Netbox
    • Removed from DebMonitor
    • Removed from Puppet server and PuppetDB
    • VM removed
    • Started forced sync of VMs in Ganeti cluster eqsin to Netbox

cookbooks.sre.hosts.decommission executed by jmm@cumin2002 for hosts: hcaptcha-proxy5001.wikimedia.org

  • hcaptcha-proxy5001.wikimedia.org (PASS)
    • Downtimed host on Icinga/Alertmanager
    • Found Ganeti VM
    • VM shutdown
    • Started forced sync of VMs in Ganeti cluster eqsin to Netbox
    • Removed from DebMonitor
    • Removed from Puppet server and PuppetDB
    • VM removed
    • Started forced sync of VMs in Ganeti cluster eqsin to Netbox

cookbooks.sre.hosts.decommission executed by jmm@cumin2002 for hosts: hcaptcha-proxy5002.wikimedia.org

  • hcaptcha-proxy5002.wikimedia.org (PASS)
    • Downtimed host on Icinga/Alertmanager
    • Found Ganeti VM
    • VM shutdown
    • Started forced sync of VMs in Ganeti cluster eqsin to Netbox
    • Removed from DebMonitor
    • Removed from Puppet server and PuppetDB
    • VM removed
    • Started forced sync of VMs in Ganeti cluster eqsin to Netbox

Cookbook cookbooks.sre.hosts.reimage was started by jmm@cumin2002 for host install5004.wikimedia.org with OS bookworm

Cookbook cookbooks.sre.hosts.reimage started by jmm@cumin2002 for host install5004.wikimedia.org with OS bookworm completed:

  • install5004 (PASS)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via gnt-instance
    • Host up (Debian installer)
    • Set boot media to disk
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202605071205_jmm_3619268_install5004.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Change #1284626 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):

[operations/puppet@production] Assign the installserver role to install5004

https://gerrit.wikimedia.org/r/1284626

Change #1284629 had a related patch set uploaded (by Ayounsi; author: Ayounsi):

[operations/homer/public@master] eqsin: update install server IP

https://gerrit.wikimedia.org/r/1284629

Change #1284626 merged by Muehlenhoff:

[operations/puppet@production] Assign the installserver role to install5004

https://gerrit.wikimedia.org/r/1284626

Change #1284636 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):

[operations/puppet@production] Update DHCP server for eqsin

https://gerrit.wikimedia.org/r/1284636

Change #1284629 merged by jenkins-bot:

[operations/homer/public@master] eqsin: update install server IP

https://gerrit.wikimedia.org/r/1284629

Change #1284636 merged by Muehlenhoff:

[operations/puppet@production] Update DHCP server for eqsin

https://gerrit.wikimedia.org/r/1284636

Change #1284657 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):

[operations/dns@master] Point webproxy in eqsin to install5004

https://gerrit.wikimedia.org/r/1284657

Change #1284657 merged by Muehlenhoff:

[operations/dns@master] Point webproxy in eqsin to install5004

https://gerrit.wikimedia.org/r/1284657

Change #1284665 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):

[operations/puppet@production] Remove ganeti5004 from eqsin cluster

https://gerrit.wikimedia.org/r/1284665

Change #1285199 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):

[operations/puppet@production] netbox: Stop syncing from eqsin01

https://gerrit.wikimedia.org/r/1285199

cookbooks.sre.hosts.decommission executed by jmm@cumin2002 for hosts: install5003.wikimedia.org

  • install5003.wikimedia.org (PASS)
    • Downtimed host on Icinga/Alertmanager
    • Found Ganeti VM
    • VM shutdown
    • Started forced sync of VMs in Ganeti cluster eqsin to Netbox
    • Removed from DebMonitor
    • Removed from Puppet server and PuppetDB
    • VM removed
    • Started forced sync of VMs in Ganeti cluster eqsin to Netbox

Change #1285199 merged by Muehlenhoff:

[operations/puppet@production] netbox: Stop syncing from eqsin01

https://gerrit.wikimedia.org/r/1285199

Change #1284665 merged by Muehlenhoff:

[operations/puppet@production] Remove ganeti5004 from eqsin cluster

https://gerrit.wikimedia.org/r/1284665

Cookbook cookbooks.sre.hosts.reimage was started by jmm@cumin2002 for host ganeti5004.eqsin.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage started by jmm@cumin2002 for host ganeti5004.eqsin.wmnet with OS bookworm completed:

  • ganeti5004 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202605110629_jmm_3105726_ganeti5004.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Change #1285538 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):

[operations/puppet@production] Add ganeti5004 to the routed Ganeti cluster in eqsin

https://gerrit.wikimedia.org/r/1285538

Change #1285538 merged by Muehlenhoff:

[operations/puppet@production] Add ganeti5004 to the routed Ganeti cluster in eqsin

https://gerrit.wikimedia.org/r/1285538

VM install5004.wikimedia.org switching disk type to drbd

Change #1275925 merged by jenkins-bot:

[operations/homer/public@master] eqsin: remove sandbox ACL on now gone interface

https://gerrit.wikimedia.org/r/1275925

Mentioned in SAL (#wikimedia-operations) [2026-05-11T10:10:41Z] <moritzm> rebalance routed Ganeti cluster in eqsin T421863

MoritzMuehlenhoff claimed this task.
MoritzMuehlenhoff updated the task description. (Show Details)

eqsin is now fully on routed Ganeti \o/

Change #1285797 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):

[operations/puppet@production] Fix Cumin alias for routed Ganeti

https://gerrit.wikimedia.org/r/1285797

Change #1285797 merged by Muehlenhoff:

[operations/puppet@production] Fix Cumin alias for routed Ganeti

https://gerrit.wikimedia.org/r/1285797