Page MenuHomePhabricator

Bird multihop BFD
Closed, ResolvedPublic

Description

While working on Icinga's check_bfd, I noticed that the BFD sessions between The Bird instances and the routers were still showing as down (BGP is properly up though).

In https://gerrit.wikimedia.org/r/c/operations/puppet/+/474819 I:

  • Added multihop config to the BFD statement in Bird
  • Opened port udp/4784 in Ferm on the server side (BFD multihop port)

Testing with dns2001 and cr1-codfw, I also temporarily:

  • Enabled more verbose logging on both sides
  • Allowed port udp/4784 on the router loopback firewall filter

None of which solved the issue, Bird's state stays on Init, while Junos on Down

Bird state
bird> show bfd sessions 
bfd1:
IP address                Interface  State      Since       Interval  Timeout
208.80.153.192            ---        Init       15:47:25      2.000    6.000
Junos state
cr1-codfw> show bfd session address 208.80.153.77 extensive 
                                                  Detect   Transmit
Address                  State     Interface      Time     Interval  Multiplier
208.80.153.77            Down                     0.000     1.000        3   
 Client BGP, TX interval 0.300, RX interval 0.300
 Local diagnostic None, remote diagnostic None
 Remote state AdminDown, version 1
 Replicated 
 Session type: Multi hop BFD
 Min async interval 0.300, min slow interval 1.000
 Adaptive async TX interval 2.000, RX interval 2.000
 Local min TX interval 2.000, minimum RX interval 0.300, multiplier 3
 Remote min TX interval 0.000, min RX interval 0.000, multiplier 0
 Local discriminator 3556, remote discriminator 0
 Echo mode disabled/inactive, no-absorb, no-refresh
 Multi-hop min-recv-TTL 254, route table 0, local-address 208.80.153.192
  Session ID: 0x4d954

1 sessions, 1 clients
Cumulative transmit rate 1.0 pps, cumulative receive rate 0.0 pps

We can see that Bird does send and receive BFD control packets, from/to the proper IPs

tcpdump
dns2001:~$ sudo tcpdump -p -i eno1 "host 208.80.153.192"
16:11:22.674389 IP cr1-codfw.wikimedia.org.49152 > dns2001.wikimedia.org.4784: UDP, length 24
16:11:23.585910 IP cr1-codfw.wikimedia.org.49152 > dns2001.wikimedia.org.4784: UDP, length 24
16:11:23.643194 IP dns2001.wikimedia.org.35807 > cr1-codfw.wikimedia.org.4784: UDP, length 24
16:11:24.519399 IP cr1-codfw.wikimedia.org.49152 > dns2001.wikimedia.org.4784: UDP, length 24
16:11:25.429422 IP cr1-codfw.wikimedia.org.49152 > dns2001.wikimedia.org.4784: UDP, length 24
16:11:25.435598 IP dns2001.wikimedia.org.35807 > cr1-codfw.wikimedia.org.4784: UDP, length 24
bird debug
dns2001:~$ tailf /tmp/bird-debug.log 
2018-11-20 15:49:20 <TRACE> bfd1: Sending CTL to 208.80.153.192 [Init]
2018-11-20 15:49:20 <TRACE> bfd1: Sending CTL to 208.80.153.193 [Init]
2018-11-20 15:49:20 <TRACE> bfd1: CTL received from 208.80.153.192 [Down C]
2018-11-20 15:49:20 <TRACE> bfd1: CTL received from 208.80.153.193 [Down C]

cr1-codfw# run monitor traffic interface ae3.2003 matching "host 208.80.153.77"
Doesn't return any BFD packets, most likely because it's offloaded to the hardware

The traceoptions would probably require JTAC to parse

Junos BFD traceoptions
Nov 20 16:49:55 [THROTTLE]bfdd_rate_limit_add_ppm_thread: Session 208.80.153.77 already programed
Nov 20 16:49:55 [SLOW_START]bfdd_slow_start_del: Delete session 208.80.153.77
Nov 20 16:49:55 Session 208.80.153.77 (IFL 0) starting version negotiation
Nov 20 16:49:55 [SLOW_START]bfdd_slow_start_start: Session 208.80.153.77
Nov 20 16:49:55 [SLOW_START]bfdd_slow_start_set: Session 208.80.153.77 pre intervals are mix rx = 300000 adpt rx = 300000 min tx = 300000 adpt tx = 1000000
Nov 20 16:49:55    SrcAddr (5) len 8: 208.80.153.77
Nov 20 16:49:55 BFD m-hops packet to 208.80.153.77 from 208.80.153.192 (IFL 0), len 24
Nov 20 16:49:55    DestAddr (8) len 8: 208.80.153.77
Nov 20 16:49:55 PPM Trace: BFD neighbor 208.80.153.77 (IFL 0): bfd_ppm_discr 3557
Nov 20 16:49:55 PPM Trace: BFD neighbor 208.80.153.77 (IFL 0) set, 0 0
Nov 20 16:49:55 PPM Trace: BFD programmed periodic xmit to 208.80.153.77 (IFL 0), interval 1 0
Nov 20 16:49:55 PPM Trace: BFD neighbor 208.80.153.77 (IFL 0): bfd_ppm_discr 3557
Nov 20 16:49:55 PPM Trace: BFD neighbor 208.80.153.77 (IFL 0) set, 0 0
Nov 20 16:49:55 PPM Trace: BFD programmed periodic xmit to 208.80.153.77 (IFL 0), interval 1 0
Nov 20 16:49:56 [SLOW_START]bfdd_slow_start_start: 208.80.153.77 is already in slow start thread
Nov 20 16:49:56    SrcAddr (5) len 8: 208.80.153.77
Nov 20 16:49:56 BFD m-hops packet to 208.80.153.77 from 208.80.153.192 (IFL 0), len 24
Nov 20 16:49:56    DestAddr (8) len 8: 208.80.153.77
Nov 20 16:49:56 PPM Trace: BFD neighbor 208.80.153.77 (IFL 0): bfd_ppm_discr 3557
Nov 20 16:49:56 PPM Trace: BFD neighbor 208.80.153.77 (IFL 0) set, 0 0
Nov 20 16:49:56 PPM Trace: BFD programmed periodic xmit to 208.80.153.77 (IFL 0), interval 2 0
Nov 20 16:49:56 PPM Trace: BFD neighbor 208.80.153.77 (IFL 0): bfd_ppm_discr 3557
Nov 20 16:49:56 PPM Trace: BFD neighbor 208.80.153.77 (IFL 0) set, 0 0
Nov 20 16:49:56 PPM Trace: BFD programmed periodic xmit to 208.80.153.77 (IFL 0), interval 2 0
Nov 20 16:50:04 [THROTTLE]bfdd_rate_limit_program_timer_expiry: Session 208.80.153.77 is removed form program therad

Digging down on the BFD packets, they are all similar to:

Bird to Junos
Ethernet II, Src: Dell_5f:6a:40 (d0:94:66:5f:6a:40), Dst: IETF-VRRP-VRID_03 (00:00:5e:00:01:03)
Internet Protocol Version 4, Src: 208.80.153.77, Dst: 208.80.153.192
User Datagram Protocol, Src Port: 60515, Dst Port: 4784
BFD Control message
    001. .... = Protocol Version: 1
    ...0 0000 = Diagnostic Code: No Diagnostic (0x00)
    10.. .... = Session State: Init (0x2)
    Message Flags: 0x80
    Detect Time Multiplier: 3 (= 3000 ms Detection time)
    Message Length: 24 bytes
    My Discriminator: 0x3e1d479f
    Your Discriminator: 0x00000de5
    Desired Min TX Interval: 1000 ms (1000000 us)
    Required Min RX Interval:  300 ms (300000 us)
    Required Min Echo Interval:    0 ms (0 us)
Junos to Bird
Ethernet II, Src: JuniperN_f2:73:c3 (64:87:88:f2:73:c3), Dst: Dell_5f:6a:40 (d0:94:66:5f:6a:40)
Internet Protocol Version 4, Src: 208.80.153.192, Dst: 208.80.153.77
User Datagram Protocol, Src Port: 49152, Dst Port: 4784
BFD Control message
    001. .... = Protocol Version: 1
    ...0 0000 = Diagnostic Code: No Diagnostic (0x00)
    01.. .... = Session State: Down (0x1)
    Message Flags: 0x48, Control Plane Independent: Set
    Detect Time Multiplier: 3 (= 6000 ms Detection time)
    Message Length: 24 bytes
    My Discriminator: 0x00000de5
    Your Discriminator: 0x00000000
    Desired Min TX Interval: 2000 ms (2000000 us)
    Required Min RX Interval: 2000 ms (2000000 us)
    Required Min Echo Interval:    0 ms (0 us)

It would be great to have a 2nd pair of eyes to check if I'm not missing something, then the next step would be to reach out to Bird's mailing list and/or JTAC.

Event Timeline

ayounsi triaged this task as Medium priority.Nov 20 2018, 5:36 PM
ayounsi created this task.
Restricted Application added a subscriber: Aklapper. · View Herald Transcript

On suggestion from the Bird mailing list (and doc) is to change the dynamic port range on the sever side.
From the current:
cat /proc/sys/net/ipv4/ip_local_port_range 32768 60999

To the IANA approved of 49152-65535.

We can see in the packet capture above dns2001.wikimedia.org.35807 > cr1-codfw.wikimedia.org.4784
Where it's possible that Junos does strict checking and doesn't accept 35807 as source port as it's lower than 49152.

Testing would be:

  1. Depool dns2001
  2. Run sysctl -w net.ipv4.ip_local_port_range="49152 65535"
  3. Look at tcpdump for source port of BFD packets > 49152
  4. Look if BFD gets established
  5. If all good: puppetize the change, if not: rollback to original value
  6. Repool dns2001

I don't think this would impact the recursive DNS server, but feedback from @BBlack would be appreciated :)

Mentioned in SAL (#wikimedia-operations) [2019-03-07T23:46:38Z] <XioNoX> set net.ipv4.ip_local_port_range="49152 65535" on dns2001 - T209989

Mentioned in SAL (#wikimedia-operations) [2019-03-07T23:53:16Z] <XioNoX> set net.ipv4.ip_local_port_range="32768 60999" on dns2001 and repool server - T209989

After changing the port range to IANA recommended range and restarting Bird, we can see the BFD packets leaving from the proper port:
IP dns2001.wikimedia.org.55170 > cr1-codfw.wikimedia.org.4784: UDP, length 24
But the situation stays the same, the session doesn't come up.

Another suggestion from the Bird mailing list.
Junos extensive output mentions Multi-hop min-recv-TTL 254. I'd guess this is set as the router knows that the remote side is in a directly connected network (so assuming a default of 255 - 1 as the session is with the loopback).
Packet capture shows BFD packets from router to server with a TTL or 255, and from server to router with a TTL of 64 (default Linux value).
The theory is that Junos ignores the Bird BFD packets, as 64 < 255.

To test it we can try to:

  1. Depool dns2001
  2. Run sysctl -w net.ipv4.ip_default_ttl="255"
  3. Look at tcpdump for TTL of Bird BFD packet
  4. Look if BFD gets established
  5. If all good: puppetize the change
  6. if not try the change from T209989#5006953
  7. If still not good rollback both changes (TTL of 64)
  8. Repool dns2001

Victory, the BFD session went up as soon as I changed the TTL to 255.

Mentioned in SAL (#wikimedia-operations) [2019-03-14T18:37:17Z] <XioNoX> set protocols bgp group Anycast4 multihop ttl 190 on cr1-codfw - T209989

Followed up on the mailing list:

Junos uses the BGP multihop TTL value for BFD as well, and assumes the other side's default TTL is 255.
So if I do:

[edit protocols bgp group Anycast4 multihop]
-     ttl 2;
+     ttl 3;

Then Multi-hop min-recv-TTL drops to 253.
I couldn't find any knob to set the default TTL of the remote side.
So an easier workaround than recompiling Bird: I set that TTL to 193, which sets min-recv-TTL to 63 and the session went up.
This requires firewall filters to only allow BGP and BFD from authorized peers.

And pushed the following to cr1/2-codfw:

[edit protocols bgp group Anycast4 multihop]
-     ttl 2;
+     /* T209989 */
+     ttl 193;

[edit firewall family inet filter loopback4]
       term allow_bfd4 { ... }
+      /* T209989 */
+      term allow_mbfd4 {
+          from {
+              source-prefix-list {
+                  bgp-sessions;
+              }
+              protocol udp;
+              port 4784;
+          }
+          then accept;
+      }

Which set the TTL to the proper value as well as restricts multi-hop BFD packets from configured BGP peers.
Sessions went up instantly.

Mentioned in SAL (#wikimedia-operations) [2019-03-14T19:02:31Z] <XioNoX> set protocols bgp group Anycast4 multihop ttl 193 on cr1/2-eqiad - T209989

Mentioned in SAL (#wikimedia-operations) [2019-03-14T19:37:11Z] <XioNoX> set protocols bgp group Anycast4 multihop ttl 193 on cr1/2-esams - T209989