An intro on routed Ganeti can be found here: https://phabricator.wikimedia.org/phame/post/view/312/ganeti_on_modern_network_design/
Routed ganeti is already running in magru, ulsfo and esams. The upcoming switch replacement at eqsin will also require to migrate the Ganeti servers in eqsin to routed Ganeti.
eqsin is currently in the old edge design for classic Ganeti; it spans a four node Ganeti cluster. Compared to the migrations in magru and esams this simplifies the migration a bit since we don't need to switch to single node clusters with limited redundancy.
All VMs will need to be rebuilt on the new cluster.
row 1: ganeti5004, ganeti5005, ganeti5006, ganeti5007
List of VMs:
- atlas5001.wikimedia.org (reinstalled as atlas5001)
- bast5004.wikimedia.org (replaced by bast5005)
- doh5001.wikimedia.org (replaced by doh5003)
- doh5002.wikimedia.org (replaced by doh5004)
- durum5001.eqsin.wmnet (replaced by durum5003)
- durum5002.eqsin.wmnet (replaced by durum5004)
- hcaptcha-proxy5001.wikimedia.org (replaced by hcaptcha-proxy5003)
- hcaptcha-proxy5002.wikimedia.org (replaced by hcaptcha-proxy5004)
- install5003.wikimedia.org (replaced by install5004)
- ncredir5001.eqsin.wmnet (replaced by ncredir5003)
- ncredir5002.eqsin.wmnet (replaced by ncredir5004)
- netflow5002.eqsin.wmnet (replaced by netflow5003)
- prometheus5002.eqsin.wmnet (replaced by prometheus5003)
- tcp-proxy5001.eqsin.wmnet (replaced by tcp-proxy5003)
- tcp-proxy5002.eqsin.wmnet (replaced by tcp-proxy5004)
When the migration is completed, we'll be able to move the servers over for the switch refresh.
The migration path will look like the following:
- Allocate IPs for eqsin routed Ganeti
- Add ganeti "customer" to Homer with the eqsin ranges
- Manually create the first IPs in Netbox to be able to add the DNS PTRs includes
- Add allocated IPs to modules/network/data/data.yaml in Puppet
- Announce that people move away from bast5004 and use a different bastion for now
- Decom bast5004 (will be re-added later)
- Decom atlas5001 (will be re-added later)
- Move all VMs in ganeti5007 to ganeti5004/5005/5006
- Reimage ganeti5007 with routed Ganeti
- Initialise new cluster
- Update ganeti5007 switch port to remove the trunked public VLAN
- Move all VMs in ganeti5004 to ganeti5006/ganeti5007
- Move all VMs in ganeti5005 to ganeti5006/ganeti5007
- Reimage ganeti5004 with routed Ganeti
- Update ganeti5004 switch port to remove the trunked public VLAN
- Setup routing between ganeti5004 and the core routers
- Reimage ganeti5005 with routed Ganeti
- Update ganeti5005 switch port to remove the trunked public VLAN
- Setup routing between ganeti5005 and the core routers
- Create prometheus5003 on routed Ganeti with insetup role and pass on to o11y to migrate existing metrics, when done decom prometheus5002
- Create atlas5001 on routed Ganeti and register it with RIPE
- Create doh5003, doh5004 on routed Ganeti and fail over services
- Create durum5003, durum5004 on routed Ganeti and fail over services
- Decom doh5001, doh5002
- Decom durum5001, durum5002
- Create hcaptcha-proxy5003, hcaptcha-proxy5004 on routed Ganeti and fail over services
- Create ncredir5003, ncredir5004 on routed Ganeti and fail over services
- Decom hcaptcha-proxy5001, hcaptcha-proxy5002
- Decom ncredir5001, ncredir5002
- Create install5004 on routed Ganeti and fail over services
- Create netflow5003 on routed Ganeti and fail over services
- Create tcp-proxy5003, tcp-proxy5004 on routed Ganeti and fail over services
- Update DHCP relay config on the switches to point to the new install5004
- Point webproxy to the new install5004
- Decom install5003
- Decom netflow5002
- Decom tcp-proxy5001, tcp-proxy5002
- Create bast5005 and tell people to use it
- Reimage ganeti5006 with routed Ganeti
- Update ganeti5006 switch port to remove the trunked public VLAN
- Setup routing between ganeti5006 and the core routers
- Setup routing between ganeti5007 and the core routers
- Remove "eqsin" from Netbox sync