The ideal path would require upstream changes in Bird (especially for v6) - http://trubka.network.cz/pipermail/bird-users/2024-April/017580.html
If this is fixed/implemented, the guest VM side would works out of the box with our current global Bird config.
In the most basic setup the Ganeti side would only require an additional config block (plus the same for v6):
protocol bgp { ipv4 { import BGP-FROM-VMS; export none; }; local 10.192.24.1 as 64612; neighbor range 10.192.24.0/23 external; multihop; }
Where BGP-FROM-VMS is a filter like we have on the switches, defining at least the IPs allowed to be advertised from VMs.
Using dynamic neighbors has a slight security risk as we can't easily enforce which IP a VM can advertised in the allowed BGP-FROM-VMS list for example if a VM setups a rogue BGP daemon and advertise 10.3.0.1 (recdns) we couldn't catch it here.
As all the VMs are trusted, this is not a blocker.
Setting up BGP authentication would remove the "rogue BGP speaker" risk.
An internal RPKI infrastructure would remove the "BGP VM advertise a different IP than it's supposed to" risk, but it's a significant task.
Dynamic neighbors has the advantage of not needing any specific config on the hypervisor side, so when a VM is migrated to a different hypervisor, there is no need for a Puppet run or to prepopulate BGP neighbors (and having them as down most of the time.
Another downside of dynamic neighbors is that only the Guest VM side can initiate the connection.
Unless we can find a way to trigger some kind of probing at each VM creation in net-common (which would be ignored for non BGP speaking VMs).
For the record, the VM side test config:
router id 10.192.24.4; debug protocols all; protocol bgp { ipv4 { import none; export none; }; local 10.192.24.4 as 64613; neighbor 10.192.24.1 external; multihop; }
Here and above, the multihop; is only a workaround (see the bird-users mailing list link). Note that BGP establishes fine with external set on both sides.
2024-04-12T09:40:15.585454+00:00 testvm2006 bird: bgp1: Started 2024-04-12T09:40:15.585593+00:00 testvm2006 bird: bgp1: Connect delayed by 5 seconds 2024-04-12T09:40:19.492098+00:00 testvm2006 bird: bgp1: Connecting to 10.192.24.1 from local address 10.192.24.4 2024-04-12T09:40:19.493146+00:00 testvm2006 bird: bgp1: Connected 2024-04-12T09:40:19.493305+00:00 testvm2006 bird: bgp1: Sending OPEN(ver=4,as=64613,hold=240,id=0ac01804) 2024-04-12T09:40:19.493384+00:00 testvm2006 bird: bgp1: Got OPEN(as=64612,hold=240,id=10.192.21.6) 2024-04-12T09:40:19.493560+00:00 testvm2006 bird: bgp1: Sending KEEPALIVE 2024-04-12T09:40:19.494178+00:00 testvm2006 bird: bgp1: Got KEEPALIVE 2024-04-12T09:40:19.494610+00:00 testvm2006 bird: bgp1: BGP session established 2024-04-12T09:40:19.494993+00:00 testvm2006 bird: bgp1: State changed to up 2024-04-12T09:40:19.786584+00:00 testvm2006 bird: bgp1: Got UPDATE 2024-04-12T09:40:19.786741+00:00 testvm2006 bird: bgp1: Got END-OF-RIB 2024-04-12T09:40:19.786823+00:00 testvm2006 bird: bgp1: Sending END-OF-RIB
There are then 2 other topics worth discussion:
1/ AS path length
The current eBGP setups is that most straightforward to configure, troubleshot, etc. But has the downside of adding an extra AS hop for the prefix advertised by the VM. So if we keep going that way we would need to do some AS-path prepending to all the other prefixes advertised by a similar ASN. For example if we decide to have a Routed Ganeti VM hosting a recdns server.
It's not an issue, but we should investigate if there are better ways of proceeding (eg. iBGP, or some other config knob).
2/ Migration BGP failover
When a VM is migrated to a different hypervisor, the BGP session will be shutdown and re-established on the new hypervisor.
Assuming we don't use multihop Bird on the hypervisor side will detect the tap interface going down and tear down the session and stop propagating the prefix, so no outage (or a few ms).
On the other side, the VM side might take up to 30s to send an update and realized it's not talking to the same peer, and re-established the session. It might be an ok tradeoff to not add extra complexity.
If faster session re-establishment is needed, we could implement BFD, shorter BGP timer, or investigate if the hypervisor can notify the VM of the migration in some way.
We could also investigate how to cleanly shutdown BGP when a VM is about to migrate to elimitate the few MS outage. Note that the VM's IP will take some MS to propagate as well.