Page MenuHomePhabricator

Support Anycast GW on EVPN switches without unique IP
Closed, ResolvedPublic

Description

Our automation for configuring IRB "Anycast Gateways" on EVPN switches assumes the use of a "virtual-gateway-address" for the gateway address shared across the switches. That in turn requires that each participating device also gets a unique IP on the subnet, with a config like this:

root@LEAF2> show configuration interfaces irb unit 100 family inet   
address 10.192.0.7/22 {
    preferred;
    virtual-gateway-address 10.192.0.1;
}

This is but one option of how to configure this functionality. Compared to the other options it is best as it means each switch has it's own IP, we can be pinged remotely and used as a source of tests locally from the switch. However, in prepping for moving the current public1-[a|b]-codfw gateways to the new row-wide switches in codfw it's obvious that we can't spare sufficient IPv4 addresses to give every switch on the row its own unique IP on the subnet.

The best solution is to use the VGA / unique IP approach on the private vlans / where possible, and change the automation to configure only the anycast IP on the interface if that is all that is present on it in Netbox. Creating task to track progress.

Event Timeline

cmooney triaged this task as Medium priority.Nov 6 2023, 12:27 PM
cmooney created this task.

Change 971937 had a related patch set uploaded (by Cathal Mooney; author: Cathal Mooney):

[operations/software/homer/deploy@master] Change 'anycast_gw' var in int config to represent type of IRB needed

https://gerrit.wikimedia.org/r/971937

Mentioned in SAL (#wikimedia-operations) [2023-11-09T20:41:11Z] <topranks> change anycast gw type to single-IP on ssw1-aX-codfw for sandbox1-a-codfw vlan (T350579)

Change 973267 had a related patch set uploaded (by Cathal Mooney; author: Cathal Mooney):

[operations/homer/public@master] Adjust homer templates to support anycast gw with single IP

https://gerrit.wikimedia.org/r/973267

In terms of the config when we have 2 IPs on the interface with the VGA setup, there is some behaviour we need to be careful of.

Consider this interaction between a test device connected to sandbox1-a-codfw from asw-a-codfw:

23:42:17.801762 e4:3d:1a:78:dc:d0 > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 208.80.152.241 tell 208.80.152.249, length 28
23:42:17.852314 48:5a:0d:85:9b:00 > e4:3d:1a:78:dc:d0, ethertype ARP (0x0806), length 60: Reply 208.80.152.241 is-at 00:00:5e:00:01:01, length 46
23:42:17.852331 e4:3d:1a:78:dc:d0 > 00:00:5e:00:01:01, ethertype IPv4 (0x0800), length 98: 208.80.152.249 > 208.80.152.241: ICMP echo request, id 53885, seq 1, length 64
23:42:17.860426 48:5a:0d:87:3f:00 > e4:3d:1a:78:dc:d0, ethertype IPv4 (0x0800), length 98: 208.80.152.241 > 208.80.152.249: ICMP echo reply, id 53885, seq 1, length 64

The system ARPs for the anycast GW address, and gets a response. But notice, even though the ARP answer has the gateway's virtual MAC, the actual packet is sourced from the normal IRB MAC instead.

This causes an issue on asw-a-codfw, which never sees any packets from the MAC address the host will be sending traffic to. As a result all frames the host sends to the gateway are treated as "unknown destination" by the L2 switch and flooded to all ports in the vlan. This is extremely undesirable. The solution is to manually configure the VGA MAC as described here. With this configuration in place the switch sources ARP replies and other traffic it generates from the gateway MAC.

Change 971937 merged by Cathal Mooney:

[operations/software/homer/deploy@master] Change 'anycast_gw' var in int config to represent type of IRB needed

https://gerrit.wikimedia.org/r/971937

Change 973267 merged by jenkins-bot:

[operations/homer/public@master] Adjust homer templates to support anycast gw with single IP

https://gerrit.wikimedia.org/r/973267

Patches to support this have been merged and it's working for the codfw row A/B public vlans, closing task.

Just a note on this, I only discovered this document after the task:

https://www.juniper.net/documentation/us/en/software/nce/nce-216-evpn-dhcp-relay/nce-216-evpn-dhcp-relay.pdf

The good news is the approach we landed on matches the reconsolidations exactly. In terms of terminology Juniper use these terms which is good to know:

IRB with Virtual Gateway Address (VGA): The type we use on the private vlans, where each interface has a unique address and the GW is configured as a 'virtual-gateway-address.

Anycast IRB: The type we use on the public vlans, where only the shared GW IP is configured on the IRB int.