Page MenuHomePhabricator

No unicast IP ranges announced to peers from eqdfw
Closed, ResolvedPublic

Description

Brandon brought it to my attention that we weren't announcing any IPv6 ranges over our PNI to Facebook from eqdfw. Upon inspection it turns out we aren't announcing any IPv6 ranges at all from the POP, for instance to NTT:

cmooney@cr2-eqdfw> show route table inet6.0 terse advertising-protocol bgp 2001:418:0:5000::3b2 

cmooney@cr2-eqdfw>

The aggregate route is configured to be created:

cmooney@cr2-eqdfw> show configuration routing-options rib inet6.0 aggregate | display set    
set routing-options rib inet6.0 aggregate defaults discard
set routing-options rib inet6.0 aggregate route 2620:0:860::/48 policy BGP_from_LVS

However only the default is being created, the /48 is nowhere to be seen:

cmooney@cr2-eqdfw> show route table inet6.0 protocol aggregate 

inet6.0: 202800 destinations, 767623 routes (201608 active, 3 holddown, 1392 hidden)
Restart Complete
+ = Active Route, - = Last Active, * = Both

::/0               *[Aggregate/130] 91w1d 04:24:45
                       Discard

While we are announcing some IPv4 routes, we are only covering the Anycast ranges (principally used for DNS) and space used by cloud:

cmooney@cr2-eqdfw> show route advertising-protocol bgp 128.242.179.181 table inet.0 terse 

inet.0: 942310 destinations, 2227553 routes (941837 active, 3 holddown, 2480 hidden)
Restart Complete
  Prefix		  Nexthop	       MED     Lclpref    AS path
* 185.15.57.0/24          Self                                    ?
* 185.71.138.0/24         Self                                    I
* 198.35.27.0/24          Self                                    I

Correspondingly there is zero transmit data from cr2-eqdfw back towards codfw.

Event Timeline

cmooney triaged this task as High priority.Jun 13 2024, 4:05 PM
cmooney created this task.
cmooney renamed this task from No IPv6 ranges announced to peers from eqdfw to No unicast IP ranges announced to peers from eqdfw.Jun 13 2024, 4:29 PM
cmooney updated the task description. (Show Details)

It seems this was an inadvertent result of the upgrade to the codfw row A/B switches, and the move there from a purely L2 switching layer to a routed one.

Specifically we ended up changing the AS-path on routes in codfw following T352920: Move lvs2011 from private1-a-codfw (row) to private1-a2-codfw (rack) vlan, with the result that contributing routes for the aggregate prefixes in codfw/eqdfw are seen with AS64811 (codfw EVPN ASN):

cmooney@cr2-eqdfw> show route protocol bgp table inet.0 terse 208.80.152.0/23 aspath-regex ".* 64600$" 

inet.0: 942247 destinations, 2228432 routes (941809 active, 0 holddown, 2437 hidden)
Restart Complete
+ = Active Route, - = Last Active, * = Both

A V Destination        P Prf   Metric 1   Metric 2  Next hop        AS path
* ? 208.80.152.224/28  B 170        100          0                  64701 64600 I
  unverified                                       >208.80.153.210
                                                    208.80.153.212
* ? 208.80.153.224/32  B 170        100          0                  64811 64600 I
  unverified                                       >208.80.153.210
                                                    208.80.153.212
* ? 208.80.153.225/32  B 170        100          0                  64811 64600 I
  unverified                                       >208.80.153.210
                                                    208.80.153.212
* ? 208.80.153.226/32  B 170        100          0                  64811 64600 I
  unverified                                       >208.80.153.210
                                                    208.80.153.212
* ? 208.80.153.232/32  B 170        100          0                  64811 64600 I
  unverified                                       >208.80.153.210
                                                    208.80.153.212
* ? 208.80.153.240/32  B 170        100          0                  64811 64600 I
  unverified                                       >208.80.153.212
                                                    208.80.153.210
* ? 208.80.153.241/32  B 170        100          0                  64811 64600 I
  unverified                                       >208.80.153.212
                                                    208.80.153.210
* ? 208.80.153.252/32  B 170        100          0                  64811 64600 I
  unverified                                       >208.80.153.212
                                                    208.80.153.210

This no longer matches the configured AS-path access list set up on the aggregate route generation config:

set policy-options as-path core_and_local_LVS "^(65002|65001)? 64600.*"
set policy-options policy-statement BGP_from_LVS term BGP_core_and_local_LVS from as-path core_and_local_LVS
set routing-options rib inet6.0 aggregate route 2620:0:860::/48 policy BGP_from_LVS
set routing-options aggregate route 208.80.152.0/23 policy BGP_from_LVS

The above BGP_from_LVS policy is only used on the aggregate config at our two network pops, eqord and eqdfw. Elsewhere we have a simpler policy which rejects anything learnt from a remote site, and allows any route at the local site to contribute:

set policy-options policy-statement BGP_aggregate_contributors term internal_only from protocol local
set policy-options policy-statement BGP_aggregate_contributors term internal_only from protocol direct
set policy-options policy-statement BGP_aggregate_contributors term internal_only from protocol static
set policy-options policy-statement BGP_aggregate_contributors term internal_only from protocol ospf
set policy-options policy-statement BGP_aggregate_contributors term internal_only from protocol ospf3
set policy-options policy-statement BGP_aggregate_contributors term internal_only then accept

set policy-options policy-statement BGP_aggregate_contributors term no_remote_confed from protocol bgp
set policy-options policy-statement BGP_aggregate_contributors term no_remote_confed from as-path from_remote_confed
set policy-options policy-statement BGP_aggregate_contributors term no_remote_confed then reject

set policy-options policy-statement BGP_aggregate_contributors term bgp from protocol bgp
set policy-options policy-statement BGP_aggregate_contributors term bgp then accept
set policy-options policy-statement BGP_aggregate_contributors then reject

The above policy is not suitable for use in eqord, however, as it is in it's own confed/sub-as, yet we wish to announce the aggregate routes for both core sites. So we can't reject routes with a remote-confed in the path. So we have an alternate policy there, BGP_from_LVS, which allows routes learnt directly from eqiad/codfw (not reflected back from any other POP) using the as-path filter core_and_local_LVS.

We also use this same policy at eqdfw, however it doesn't seem needed there. Eqdfw is part of the same confed/sub-as as codfw itself, and thus routes learnt from codfw have no confed ASN on them in the first place. So it's logically part of codfw, despite being in another site. As a result we should instead have the default configuration in eqdfw, allowing all bgp routes contribute to the aggregate as long as they aren't from a remote confed.

To sum up we should make the following changes:

  1. Adjust the aggregate route configuration on cr2-eqdfw to use the default BGP_aggregate_contributors policy.
  2. Modify the as-path regex used on the cr2-eqord policy as follows:
    1. Keep the current regex part to enforce the first ASN is directly from codfw or eqiad (blocking reflected routes and those from elsewhere)
    2. Modify the regex to permit routes with intermediate ASN of 64810 (switches eqiad), 64811 (switches codfw), 64700 (frack eqiad), 64701 (frack codfw) after the codfw/eqiad sub-as and before the LVS AS (64600)
  3. Rename the policy used on cr2-eqord from BGP_from_LVS to BGP_aggregate_contrib_eqord
    1. This reflects the fact the site is a snowflake and the policy is only in use there

Change #1043229 had a related patch set uploaded (by Cathal Mooney; author: Cathal Mooney):

[operations/homer/public@master] Set eqdfw to use default aggregate policy, and modify eqord policy

https://gerrit.wikimedia.org/r/1043229

Mentioned in SAL (#wikimedia-operations) [2024-06-13T21:03:34Z] <topranks> changing BGP aggregate contribution policy / external route announcement cr2-eqord (T367439)

Mentioned in SAL (#wikimedia-operations) [2024-06-13T21:04:04Z] <topranks> changing BGP aggregate contribution policy / external route announcement cr2-eqdfw (T367439)

I've pushed this change to cr2-eqdfw and it seems to be doing what we need there:

Codfw /48 is announced to Facebook:

cmooney@cr2-eqdfw> show route advertising-protocol bgp 2620:0:1cff:dead:beee::11c8 

inet6.0: 203015 destinations, 768272 routes (201810 active, 0 holddown, 1370 hidden)
Restart Complete
  Prefix		  Nexthop	       MED     Lclpref    AS path
* 2620:0:860::/48         Self                                    I

Codfw /23 is now announced to NTT in addition to the Anycast/Cloud prefixes:

cmooney@cr2-eqdfw> show route advertising-protocol bgp 128.242.179.181    

inet.0: 942766 destinations, 2228770 routes (942350 active, 5 holddown, 1050 hidden)
Restart Complete
  Prefix		  Nexthop	       MED     Lclpref    AS path
* 185.15.57.0/24          Self                                    ?
* 185.71.138.0/24         Self                                    I
* 198.35.27.0/24          Self                                    I
* 208.80.152.0/23         Self                                    I

I'm monitoring the change in traffic levels. Right now it seems negligible, however that is not much surprise, prior to the LVS peering change that caused us to stop sending them fom codfw (T352920) they also barely showed on the graphs. The biggest ingress we see now is over peering at Equinix and it's still only about 20Mbit:

image.png (756×929 px, 196 KB)

Upstreams are presumably preferring paths directly into codfw which, although we're not pre-pending or otherwise trying to dissuade anyone from using eqdfw, is probably a good thing.

Just to note that for the same time period (since March 5th) we've not been announcing the codfw aggregates from eqord:

cmooney@cr2-eqord> show route advertising-protocol bgp 192.80.17.197 terse 208.80.152.0/22    

inet.0: 941852 destinations, 2388097 routes (941226 active, 3 holddown, 919 hidden)
Restart Complete
  Prefix		  Nexthop	       MED     Lclpref    AS path
* 208.80.154.0/23         Self                                    I

Gonna copy some of the discussion from the patch here as I think it's easier for discussion and a record of what we decide:

@ayounsi wrote:

eqdfw doesn't have good redundancy even if it's the same AS than codfw.
The two main links do share some fate, so they should not really be considered as redundant. Maybe we can consider the GTT vlan to codfw as solid enough redundancy from the two main links ?

The GTT vlan is probably solid enough I'd say, but this is a very good point.

As cr2-eqdfw have itself interface IPs in the 208.80.152.0/23 range, I think it will always advertise those prefixes, even if it's cut off from codfw, that's what I wanted to avoid in the past.

Again very good point and one I'd not thought through. What might be an option there is to change the aggregate config in eqdfw such that local/direct routes do not contribute? Only OSPF or BGP routes do? Or even only BGP as we have more control there with the as-path regex?

Not sure why cr2-eqdfw# show protocols bgp group Confed_eqiad have the two eqiad peers as inactive, but that seems sub-optimal.

Yeah I wanted to ask about that. I agree I don't think it makes sense, perhaps left over from some incident or maintenance and one of us forgot to re-enable?

I think here there is also a risk that if the eqdfw-codfw links go down, eqdfw will still learn codfw prefixes from eqiad through that GTT link.

Yeah. So any of those in BGP are ok, the no_remote_confed term protects us. But the routes in OSPF are an issue. Longer term I think (it would happen naturally with L3 switches) we should aim to have all the public vlan ranges in BGP too, and then remove OSPF/local/direct as contributing protocols.

BGP_from_LVS was meant to be the same between all the network POPs, with knams gone it might be easier, but I think it would be better to keep the same policy/logic between eqord and eqdfw.

I'm not against that as such, but we also need to appreciate the two sites are quite different as one is in the same sub-as as the nearby core site, and only announces prefixes for there. While the other is further away and announces prefixes learnt from both core sites which are in different sub-asns.

With that said, perhaps the change with least moving parts here would be to switch eqdfw back to using the same policy as eqord, but keep the updated as-path regex from the patch? That means:

  • The regex will match eqiad routes in eqdfw
  • Right now they are blocked by the no-remote-confed term
  • But the aggregate config and bgp-out in eqdfw only includes the codfw range (i.e. 208.80.152.0/23 for v4)
  • So we will only announce the codfw ranges - as we want
  • We don't have ospf/local/direct contributing
  • So if the links to codfw go down in eqdfw we're protected, codfw routes in BGP will no longer match the as-path regex

One last thing, it might be worth further adjusting the regex, so the last ASN can also be our Anycast one? That would mean the ns0/ns1 IPs would contribute. You can see the difference in a test for eqdfw here, and eqord here.

Your proposal seems good to me.

Adding the anycast AS makes sens, I think I initially deployed that before we had any public anycast.

Your proposal seems good to me.

Adding the anycast AS makes sens, I think I initially deployed that before we had any public anycast.

Ok cool, I'll modify the patch and we can review.

Change #1043229 merged by jenkins-bot:

[operations/homer/public@master] Update aggregate route creation policy for network pops

https://gerrit.wikimedia.org/r/1043229

Mentioned in SAL (#wikimedia-operations) [2024-07-04T09:53:32Z] <topranks> Pushing updated BGP policy to cr2-eqord in Chiacago to re-announce codfw IP ranges there T367439

Ok change merged, we are now announcing codfw ranges from eqord again:

cmooney@cr2-eqord> show route advertising-protocol bgp 192.80.17.197                          

inet.0: 943760 destinations, 2391140 routes (941791 active, 2 holddown, 2258 hidden)
Restart Complete
  Prefix		  Nexthop	       MED     Lclpref    AS path
* 208.80.152.0/23         Self                                    I
* 208.80.154.0/23         Self                                    I
cmooney@cr2-eqord> show route advertising-protocol bgp 2001:418:16::118 table inet6.0   

inet6.0: 203209 destinations, 704024 routes (202056 active, 3 holddown, 1381 hidden)
Restart Complete
  Prefix		  Nexthop	       MED     Lclpref    AS path
* 2620:0:860::/48         Self                                    I
* 2620:0:861::/48         Self                                    I

I'll keep an eye on the graphs and make sure everything looks ok.

Change #1052086 had a related patch set uploaded (by Cathal Mooney; author: Cathal Mooney):

[operations/homer/public@master] Announce Anycast ranges from Network POPs

https://gerrit.wikimedia.org/r/1052086

All seems good with the policy changes now, closing task.

Mentioned in SAL (#wikimedia-operations) [2024-07-10T06:58:02Z] <XioNoX> push policy-statement BGP_agg_net_pops to all CRs (noop as it's not applied there) - T367439

Change #1052086 merged by jenkins-bot:

[operations/homer/public@master] Announce Anycast ranges from Network POPs

https://gerrit.wikimedia.org/r/1052086

Mentioned in SAL (#wikimedia-operations) [2024-07-12T13:10:20Z] <topranks> pushing updated BGP policy to cr2-eqord and cr2-eqdfw to announce Anycast ranges from network pops (T367439)

Change #1053935 had a related patch set uploaded (by Cathal Mooney; author: Cathal Mooney):

[operations/homer/public@master] Adjust route generation for Anycast ranges at eqord

https://gerrit.wikimedia.org/r/1053935

Change #1053935 merged by jenkins-bot:

[operations/homer/public@master] Adjust route generation for Anycast ranges at eqord

https://gerrit.wikimedia.org/r/1053935

Mentioned in SAL (#wikimedia-operations) [2024-07-17T15:32:29Z] <topranks> Adjust anycast route policy at Chicago Network POP cr2-eqord to announce anycast ranges T367439