Our current Ganeti clusters currently run in bridged mode, which mean that a guest VM will have a direct L2 adjacency with its hypervisor's switch vlans.
For example rpki1001 is on the "row C" Ganeti cluster, and have a private1-c-eqiad IP, like any other private host (physical or VMs) in that row.
As the "row C cluster" hypervisors are spread within that L2 domain (all racks in row C), moving the VM from one hypervisor to the other (eg. for hypervisor maintenance) is seamless as the VM stays in the same vlan.
The new eqiad rows E and F differ from that model as the L2 domains are contained per rack (and not spread across the entire row) for various reasons, including stability (smaller failure domains).
Keeping any overlay/tunneling based solution (eg. VXLAN) is out of the equation.
If we were to deploy a Ganeti cluster in bridged mode, each Ganeti nodegroup (a subcluster grouping concept in Ganeti) would need to stay within its own rack (as well as VM mobility), which itself could be an option (eg. multiple tiny nodegroups - say 2 to 3 nodes).
However, Ganeti can also work in routed mode.
In that mode, the VMs have IPs different from the rows/rack subnet (eg. from a prefix reserved to VMs).
The hypervisor acts as a router and advertises to the network the IPs of the VMs it is hosting (eg. with BGP).
This allows hypervisors of the same cluster (nodegroup as well) to reside in various locations (even in different DCs, though not recommended).
This will require changes in provisioning (IP allocation) as well as tooling around Ganeti (to advertise/withdrew) prefixes.
Some relevant links:
- https://wikitech.wikimedia.org/wiki/Ganeti_evaluation
- https://github.com/grnet/nfdhcpd (maybe useful for the IP allocation)
- https://github.com/grnet/gnt-networking (maybe useful for the Advertising)