Page MenuHomePhabricator

Nokia SR-Linux - wonky routing with IPv6 RAs and EVPN Anycast GW
Open, HighPublic

Description

I discovered an issue yesterday when attempting to migrate the IP gateway for analytics1-d-eqiad from our core routers to the Nokia leaf switches in row D.

Anycast GW

The switches in those rows are running EVPN/VXLAN, and use the 'Anycast GW' function. This means that every switch in the row has the same IP configured on its irb0.1023 interface, which is what hosts on the vlan use as their default gateway.

To make this work the Anycast GW IRB interface needs to use the same MAC address for the anycast IP on every switch. That allows for things like VM mobility, and ensures any ARP/ND response from any switch will be consistent.

Problem

This is slightly more complicated when IPv6 router-advertisements are used. RAs have an optional field - the "Source link-layer address" option. If this is not present then the host, on receipt of an RA, will use normal neighbor discovery to find the MAC of the IP that sent the RA. And for normal neighbor discovery the Nokia switches do what they should, the anycast mac is returned every time:

set / interface irb0 subinterface 1023 anycast-gw anycast-gw-mac 12:00:00:00:10:23
root@sretest1006:~# ip -6 route show default
default via fe80::1000:ff:fe00:1023 dev ens2f0np0 proto ra metric 1024 expires 598sec hoplimit 64 pref medium
root@sretest1006:~# while true; do ndisc6 fe80::1000:ff:fe00:1023 ens2f0np0 | grep Target; sleep 1; done 
Target link-layer address: 12:00:00:00:10:23
Target link-layer address: 12:00:00:00:10:23
Target link-layer address: 12:00:00:00:10:23
Target link-layer address: 12:00:00:00:10:23
Target link-layer address: 12:00:00:00:10:23
Target link-layer address: 12:00:00:00:10:23
Target link-layer address: 12:00:00:00:10:23
Target link-layer address: 12:00:00:00:10:23
Target link-layer address: 12:00:00:00:10:23
Target link-layer address: 12:00:00:00:10:23

The issue is the MAC in the ""Source link-layer address" section of the RAs they send is different:

ICMPv6 Option (Source link-layer address : a8:e5:ec:78:4f:3c)
    Type: Source link-layer address (1)
    Length: 1 (8 bytes)
    Link-layer address: a8:e5:ec:78:4f:3c (a8:e5:ec:78:4f:3c)

When a host receives one of these RAs it will update its ND cache with the MAC address from this field (normal behaviour as per RFC4861 6.3.4). Given there are 8 switches with this vlan configured, all sending RAs, that means hosts are constantly updating the MAC for their gateway IP with different values:

cmooney@an-druid1005:~$ sudo ip -ts monitor | grep fe80::1000:ff:fe00:1023 | grep REACHABLE
[2026-03-19T21:55:16.267418] fe80::1000:ff:fe00:1023 dev eno1 lladdr 12:00:00:00:10:23 router REACHABLE 
[2026-03-19T21:55:44.427402] fe80::1000:ff:fe00:1023 dev eno1 lladdr a8:e5:ec:78:4f:3c router REACHABLE 
[2026-03-19T21:56:13.355421] fe80::1000:ff:fe00:1023 dev eno1 lladdr 12:00:00:00:10:23 router REACHABLE 
[2026-03-19T21:56:37.419366] fe80::1000:ff:fe00:1023 dev eno1 lladdr a8:e5:ec:78:69:3c router REACHABLE 
[2026-03-19T21:57:03.275407] fe80::1000:ff:fe00:1023 dev eno1 lladdr a8:e5:ec:78:57:3c router REACHABLE 
[2026-03-19T21:57:12.235372] fe80::1000:ff:fe00:1023 dev eno1 lladdr 12:00:00:00:10:23 router REACHABLE 
[2026-03-19T21:57:34.251409] fe80::1000:ff:fe00:1023 dev eno1 lladdr a8:e5:ec:78:59:3c router REACHABLE 
[2026-03-19T21:58:02.667361] fe80::1000:ff:fe00:1023 dev eno1 lladdr a8:e5:ec:78:73:3c router REACHABLE 
[2026-03-19T21:58:17.771407] fe80::1000:ff:fe00:1023 dev eno1 lladdr a8:e5:ec:78:59:3c router REACHABLE 
[2026-03-19T21:58:41.323341] fe80::1000:ff:fe00:1023 dev eno1 lladdr 12:00:00:00:10:23 router REACHABLE 
[2026-03-19T21:58:56.939420] fe80::1000:ff:fe00:1023 dev eno1 lladdr a8:e5:ec:78:69:3c router REACHABLE 
[2026-03-19T21:59:11.275415] fe80::1000:ff:fe00:1023 dev eno1 lladdr a8:e5:ec:78:41:3c router REACHABLE

Wonky routing

The result is hosts keep changing what switch they are using as default gateway. It seems to be the top-of-rack ~65% of time, but it keeps switching between them:

cmooney@an-druid1005:~$ mtr -b -w -c 1000 -6 cr1-eqiad.wikimedia.org 
Start: 2026-03-19T19:54:02+0000
HOST: an-druid1005                                            Loss%   Snt   Last   Avg  Best  Wrst StDev
  1.|-- irb0-1023.lsw1-d3-eqiad.eqiad.wmnet (2620:0:861:108::4)  0.1%  1000    0.3   0.2   0.2   1.0   0.1
        irb0-1023.lsw1-d8-eqiad.eqiad.wmnet (2620:0:861:108::8)
        irb0-1023.lsw1-d7-eqiad.eqiad.wmnet (2620:0:861:108::7)
        irb0-1023.lsw1-d4-eqiad.eqiad.wmnet (2620:0:861:108::5)
        irb0-1023.lsw1-d6-eqiad.eqiad.wmnet (2620:0:861:108::6)
        irb0-1023.lsw1-d2-eqiad.eqiad.wmnet (2620:0:861:108::3)
        irb0-1023.lsw1-d1-eqiad.eqiad.wmnet (2620:0:861:108::2)
  2.|-- lo50.ssw1-d1-eqiad.eqiad.wmnet (2620:0:861:130::1)       4.8%  1000    0.5   0.3   0.3   1.2   0.1
  3.|-- cr1-eqiad.wikimedia.org (2620:0:861:ffff::1)             0.0%  1000    0.8   1.1   0.5  34.5   3.3

Fix

I couldn't find any configuration option that would seem to support this. Tbh we should have spotted it before during testing, but we had less switches in the test setup (only two) and must have missed it.

I will open a ticket with Nokia about it but it looks like they don't support using IPv6 RAs with Anycast GW. Hopefully they can make a change so that if Anycast GW is configured they either omit the "source link address" field in RAs or send the Anycast MAC.