Page MenuHomePhabricator
Paste P78727

Routed ganeti setup with bridge device
ActivePublic

Authored by cmooney on Jul 1 2025, 9:50 AM.
Tags
None
Referenced Files
F63358428: Routed ganeti setup with bridge device
Jul 7 2025, 2:40 PM
F62762287: Routed ganeti setup with bridge device
Jul 1 2025, 11:11 AM
F62762257: Routed ganeti setup with bridge device
Jul 1 2025, 11:10 AM
F62762237: Routed ganeti setup with bridge device
Jul 1 2025, 11:09 AM
F62761695: Routed ganeti setup with bridge device
Jul 1 2025, 10:50 AM
F62761655: Routed ganeti setup with bridge device
Jul 1 2025, 10:48 AM
F62761464: Routed ganeti setup with bridge device
Jul 1 2025, 10:42 AM
F62761206: Routed ganeti setup with bridge device
Jul 1 2025, 10:30 AM
Subscribers
None
Example of how "routed" ganeti type forwarding can be set up using a bridge device.
The basic approach here is very similar to how we have done 'routed_ganeti' before. On
both the VM and the hypervisor side we only have /32 IPs each side (for v4), and just a
link local subnet for v6. The difference is that instead of adding the same IPs multiple
times - to every tap* interface on the hypervisor side - we make all the tap* interfaces
a member of a bridge device, and place the IPs on the bridge.
Again we make use of the 'onlink' config and the fact Linux will perform ARP for an IP we
route via an interface - even if it's not on the local subnet (as we have /32s and /128s).
The benefit is that the host sees all VMs as reachable on the same L3 interface (br0), which
might help with things like BGP dynamic neighbors if they need to be associated with a specific
interface. It might also simplify the ganeti VM instantiation as making the 'tap' devices part
of a bridge is its default behaviour (same as the L2 setup).
The test setup was done using network namespaces and veth pairs, rather than VMs and tap
interfaces, but the logic is the same.
Networking was set up with these commands: https://phabricator.wikimedia.org/P78724
Built by this script: https://phabricator.wikimedia.org/P78725
On the hypervisor this results in the following, the t0 and t1 interfaces represent the 'tap'
interfaces we'd have in a real setup, and are members of 'br0':
root@debian12:~# ip -br addr show
lo UNKNOWN 127.0.0.1/8 ::1/128
enp1s0 UP 192.168.122.3/24 fe80::5054:ff:fe20:2351/64
br0 UP 10.192.24.1/32 fe80::2022:22ff:fe22:2201/64
t0@if4 UP
t1@if6 UP
A few notes:
- We need to disable ICMP redirects for this to work. Otherwise the system will not route
between VMs. Because the VMs are all connected via the same bridge the host will instead try
to send ICMP redirects to tell the VMs to communicate directly. But that won't work as they
are not on the same subnet, instead we want to leverage the "onlink" routes we've added and
have the hypervisor route between them.
- Disabling these has to be done in different ways for IPv6/IPv4:
*** For IPv4 we can simply tell the hypervisor not to send ICMP redirects using:
sysctl -w net.ipv4.conf.all.send_redirects=0
sysctl -w net.ipv4.conf.default.send_redirects=0
*** For IPv6 we cannot disable them being generated, so we have to disable them being processed for the given interface on the VM side:
ip netns exec vm0 sysctl -w net.ipv6.conf.eth0.accept_redirects=0
- Otherwise the VM (netns) side is set up the exact same way with no changes.
- For good hygiene I removed the auto-generated link-locals from the tap* devices on the
hypervisor side. This won't break anything, just seemed cleaner.
- The /32 and /128 routes we add on the hypervisor side reference the 'br0' device rather than the
'tap' device. The hypervisor will run ARP/ND on the bride to get the MAC to send to, and this
process will populate the bridge forwarding table with each MAC learnt on the given tap interface,
i.e.:
root@debian12:~# ip -4 route show
10.0.24.10 dev br0 scope link
10.1.24.10 dev br0 scope link
root@debian12:~# ip -4 neigh show dev br0
10.0.24.10 lladdr ea:3e:e8:0b:fc:ee STALE
10.1.24.10 lladdr aa:53:cf:71:00:57 STALE
root@debian12:~# bridge fdb show | egrep "ea:3e:e8:0b:fc:ee|aa:53:cf:71:00:57"
ea:3e:e8:0b:fc:ee dev t0 master br0
aa:53:cf:71:00:57 dev t1 master br0
root@debian12:~# ip -6 route show | grep br0
2001:bb6:8b70:9e60:10:0:24:10 dev br0 metric 1024 pref medium
2001:bb6:8b70:9e61:10:1:24:10 dev br0 metric 1024 pref medium
fe80::/64 dev br0 proto kernel metric 256 pref medium
root@debian12:~# ip neigh show dev br0 | grep bb6
2001:bb6:8b70:9e61:10:1:24:10 lladdr aa:53:cf:71:00:57 STALE
2001:bb6:8b70:9e60:10:0:24:10 lladdr ea:3e:e8:0b:fc:ee STALE
# VM IPs are set up same as before on both:
root@vm0:~# ip -br -4 addr show
eth0@if5 UP 10.0.24.10/32
root@vm0:~# ip -4 route show
default via 10.192.24.1 dev eth0 onlink
root@vm1:~# ip -4 -br addr show
eth1@if7 UP 10.1.24.10/32
root@vm1:~# ip -4 route show
default via 10.192.24.1 dev eth1 onlink
## We can ping between:
root@vm1:~# mtr -r -n -c 2 10.0.24.10
Start: 2025-07-01T11:11:41+0100
HOST: debian12 Loss% Snt Last Avg Best Wrst StDev
1.|-- 10.192.24.1 0.0% 2 0.1 0.1 0.1 0.1 0.0
2.|-- 10.0.24.10 0.0% 2 0.1 0.1 0.1 0.1 0.0
## And same for IPv6:
root@vm0:~# ip -br -6 addr show
eth0@if5 UP 2001:bb6:8b70:9e60:10:0:24:10/128 fe80::e83e:e8ff:fe0b:fcee/64
root@vm0:~#
root@vm0:~# ip -br -6 route show
2001:bb6:8b70:9e60:10:0:24:10 dev eth0 proto kernel metric 256 pref medium
fe80::/64 dev eth0 proto kernel metric 256 pref medium
default via fe80::2022:22ff:fe22:2201 dev eth0 metric 1024 pref medium
root@vm1:~# ip -br -6 addr show
eth1@if7 UP 2001:bb6:8b70:9e61:10:1:24:10/128 fe80::a853:cfff:fe71:57/64
root@vm1:~#
root@vm1:~# ip -br -6 route show
2001:bb6:8b70:9e61:10:1:24:10 dev eth1 proto kernel metric 256 pref medium
fe80::/64 dev eth1 proto kernel metric 256 pref medium
default via fe80::2022:22ff:fe22:2201 dev eth1 metric 1024 pref medium
root@vm1:~# mtr -w -n -c 2 2001:bb6:8b70:9e60:10:0:24:10
Start: 2025-07-01T11:24:18+0100
HOST: debian12 Loss% Snt Last Avg Best Wrst StDev
1.|-- fe80::2022:22ff:fe22:2201 0.0% 2 0.1 0.1 0.1 0.1 0.0
2.|-- 2001:bb6:8b70:9e60:10:0:24:10 0.0% 2 0.1 0.1 0.1 0.1 0.0