Page MenuHomePhabricator

cloudlb: figure out routing
Closed, ResolvedPublic

Description

The cloudlb servers have a special setup in which they are connected like this:

  • to wikiland production networks (10.x) natively (default)
  • to cloud-private subnets (172.20.x) on a VLAN interface.
  • to the internet via BGP using a VIP through the cloud-private VLAN.

This VIP can receive traffic to the internet. But as of this writing, the return traffic will use the default route on the host, which is the wikiland production network. Therefore the traffic never returns because the asymmetric routing.

When we were originally thinking about this project we already anticipated this problem, see https://wikitech.wikimedia.org/wiki/Wikimedia_Cloud_Services_team/EnhancementProposals/Iteration_on_network_isolation#Default_route_for_servers_connected_to_cloud-private

The options to address this include:

Option 1

Change the default native network to be cloud-private rather than wikiland production.

Option 2

Introduce a VRF / l3mdev in cloudlb servers, to allow having 2 separate routing tables with 2 different default routes.

We will need to instrument the services to use the right VRF for their operations, this includes:

  • Bird BGP session with cloudsw. @cmooney has validated that BIRD can work in this setup.
  • HAproxy backend connectivity

A VRF is what cloudgw uses for similar reasons.

Option 3

Introduce a linux netns. Similar to option 2 but more transparent to Bird / HAproxy.

Option 4

Some kind of magic or hack to allow the asymmetric routing, disable reverse path filter somewhere etc.

Event Timeline

aborrero changed the task status from Open to In Progress.May 5 2023, 4:20 PM
aborrero triaged this task as High priority.
aborrero updated the task description. (Show Details)

Change 916528 had a related patch set uploaded (by Arturo Borrero Gonzalez; author: Arturo Borrero Gonzalez):

[operations/puppet@production] cloud_private_subnet: add support for VRF if using BGP

https://gerrit.wikimedia.org/r/916528

Thanks @aborrero, I figured this one might not be so smooth. A few quick comments:

Option 1:

This option could potentially work if we announce all the WMF prod ranges via BGP to the cloudlb. So it would have specific routes to the prod realm (bastions for SSH etc) that it needed to reach.

However I think this one could get messy and be difficult to troubleshoot longer term. I think best avoided.

Option 2:

This is probably the best option, unless the Linux VRF causes some issue for HAproxy or the anycast health-check to function. It may not even require many changes to Bird, as we're not sending routes to the cloudlb with BGP (in which case you need to setup Bird to export them to the correct vrf table).

Option 2.5:

A twist on it would be to use some kind of ip rules on the host. Ultimately the Linux VRF implementation is just a way to do that automatically based on source interface, but you can also do things like:

root@cloudlb2001-dev:~# ip route add default via 172.20.5.1 table 100
root@cloudlb2001-dev:~# ip rule add from 185.15.57.24 lookup 100

Which magically makes it work:

root@cloudlb2001-dev:~# ping -I 185.15.57.24 1.1.1.1 
PING 1.1.1.1 (1.1.1.1) from 185.15.57.24 : 56(84) bytes of data.
64 bytes from 1.1.1.1: icmp_seq=1 ttl=60 time=1.98 ms
64 bytes from 1.1.1.1: icmp_seq=2 ttl=60 time=2.11 ms
64 bytes from 1.1.1.1: icmp_seq=3 ttl=60 time=2.04 ms
root@nbgw:~# mtr -z -b -w -c 5 185.15.57.24
Start: 2023-05-05T18:35:30+0100
HOST: nbgw                                                                     Loss%   Snt   Last   Avg  Best  Wrst StDev
  1. AS5466   86.44.36.1                                                        0.0%     5    4.5   7.5   4.5  10.0   2.2
  2. AS5466   86.43.253.104                                                     0.0%     5    4.0   4.0   3.8   4.1   0.1
  3. AS5466   86.43.252.80                                                      0.0%     5    5.0   5.6   5.0   6.5   0.6
  4. AS???    ???                                                              100.0     5    0.0   0.0   0.0   0.0   0.0
  5. AS2914   ae-3.a00.dublir01.ie.bb.gin.ntt.net (83.231.146.221)              0.0%     5   19.8  16.1   4.9  26.7  10.5
  6. AS???    ???                                                              100.0     5    0.0   0.0   0.0   0.0   0.0
  7. AS3356   4.69.210.141                                                     20.0%     5  123.6 126.2 123.6 132.2   4.1
  8. AS3356   WIKIMEDIA-F.ear1.Dallas1.Level3.net (64.156.73.170)               0.0%     5  146.9 147.5 146.7 149.8   1.3
  9. AS14907  xe-0-0-47-1001.cloudsw1-b1-codfw.wikimedia.org (208.80.153.179)   0.0%     5  149.7 150.0 147.1 153.5   2.5
 10. AS14907  185.15.57.24                                                      0.0%     5  145.2 145.3 145.2 145.4   0.1

Linux VRF is based on these rules, but more for routing between interfaces and selecting the routing table to lookup based on the incoming one. The example is instead selecting the interface to use based on the source IP in the packet header.

Option 4

I don't like the sound of that at all, let's try to make one of the others work!

Change 916528 merged by Arturo Borrero Gonzalez:

[operations/puppet@production] cloud_private_subnet: add support for return traffic to public VIPs if using BGP

https://gerrit.wikimedia.org/r/916528

Change 916528 merged by Arturo Borrero Gonzalez:

[operations/puppet@production] cloud_private_subnet: add support for return traffic to public VIPs if using BGP

https://gerrit.wikimedia.org/r/916528

This merged option 2.5 by @cmooney above, and I can ping the BGP VIP from laptop now!