Page MenuHomePhabricator

Traffic Engineering for Anycast Ranges
Closed, ResolvedPublic

Description

@ssingh made me aware of a user report from a Wikidough tester in Finland, who was hitting the Wikidough instance in eqiad, when presumably esams is closer to them.

The traceroute revealed the simple explanation, the user's ISP does not peer with us, and uses Lumen (Level 3 / AS3356) as transit. Lumen path presumably looks good to that ISP - two hops - and so they send it to them. Lumen as expected route the packets by one of our direct links to them, and thus it comes in to eqiad.

Creating this task as we may want to add some traffic-engineering communities to our announcement of the 185.71.138.0/24 Anycast range to Transit providers based on region.

Some communities which might be of use include:

Transit ProviderCommunityProvider's Description
Lumen64980:0Announce to customers but not to EU peers
Lumen64984:0Prepend 4x to all EU peers
NTT2914:4013prepend o/b to all customers 3x in North America
NTT2914:4023prepend o/b to all peers 3x in North America
NTT2914:4213prepend o/b to all customers 3x in Europe
NTT2914:4223prepend o/b to all peers 3x in Europe
Telia1299:2003Prepend 3 times to all European peers
Telia1299:5003Prepend 3 times to all North America peers

Just a sample to give an idea of what we might do. Won't prevent this kind of thing completely, but it could indeed help. Certainly I think it might be worth trying for Wikidough. Not only will users have higher latency DNS if their requests take a long path, but CDNs may direct them to server farms close to the Wikidough host, compounding the problem.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Some thoughts:

  • We need to find the good balance between config complexity and low latency for users, otherwise it's going to be a cat and mouse game, fixing special cases all the time
  • We already do similar tuning in eqsin
    • Mostly because where a lot of peering between Asian providers is done in the US west coast or in Europe
  • Peering is going to be even trickier as:
    • Large ISPs only peer outside their country borders
    • There is no granularity on how to tune the advertisements (it's usually only send or don't send)
  • This should go hand in hand with T283614 to help catching any outliers

@ssingh what's the timeline for Wikidough? So we know how to prioritize this task.

Agreed we need to balance complexity and usefulness. A few points:

  • I think it's too complex to consider doing this for peers at IXPs.
    • I would only anticipate it to be something we do with our transits.
    • It may be an idea to add "no export" community to peers, to only catch their direct (non-eBGP) customers.
  • I feel setting "do not announce outside region" or "pre-pend (max times) outside region" would be the best policy, on transits.
  • This is not a magic bullet that will solve it, we should decide our policy, set it up and be done with it.
    • 100% agreed that we can't get into a cat and mouse game here.
  • Cloudflare setup might require special consideration.

(Thanks Cathal for filing this task!)

Some thoughts:
[...]
@ssingh what's the timeline for Wikidough? So we know how to prioritize this task.

The next phase (and what we are currently working towards) is the community release (WMF, Wikimedia community) and we are planning to get it done by the end of Q2. Thanks!

cmooney triaged this task as Medium priority.Aug 27 2021, 7:35 AM

Change 728255 had a related patch set uploaded (by Ayounsi; author: Ayounsi):

[operations/homer/public@master] Add transit BGP communities for anycast traffic engineering

https://gerrit.wikimedia.org/r/728255

Change 728256 had a related patch set uploaded (by Ayounsi; author: Ayounsi):

[operations/homer/public@master] Configure transit specific outbound BGP communities

https://gerrit.wikimedia.org/r/728256

I currently assume that:

  • IX peers are mostly local, so no special care needs to happen to them
    • If this happens to be incorrect we could investigate not sending the anycast prefixes over IXPs (which could be a large drawback) or using the NO-EXPORT BGP community (most likely not supported by many peers)
  • HE peers at all our locations so sub-optimal routing is less likely (and they don't provide BGP communities)
  • The two esams transit providers that don't support BGP communities are mostly local providers (UK/Switzerland) and thus have a low risk of sub-optimal routing
  • eqsin is out of the picture for anycast specific tuning as the tuning is already done for the whole site
  • Return traffic is out of the picture as we don't share full BGP views between sites (so once the user hit the good site, return traffic will be of the shortest path)

All our other transit providers support prepending their AS to our prefixes when they re-advertise it in other regions.
Those regions varies between providers, but luckily match our POPs location (eg. we don't have a POP in between regions).
The only (non-blocking) limitation is Lumen that only knows of the EU region.

For those, the policy I applied is:

  • Ask the provider to prepend 3x their AS to our prefixes when advertised in our other regions
    • For example when we send anycast prefix X to NTT in esams, we ask them to set prepending when they re-advertise them in the US and Asia, as our US and Asia POPs are more suitable

Taking the example in the task description, the AS-path for that user in Finland will be 3 hop longer to eqiad that it is now, which is usually sufficient (based on average as-path length in the DFZ) for their ISP to consider a different path (unless they force that path with local-preference, but there is nothing we can do.

lgtm

using the NO-EXPORT BGP community (most likely not supported by many peers)

FYI i have had a good experience using no-export at IX's, i.e. it is mostly honoured and when it wasn't it was generally a mistake.

IX peers are mostly local, so no special care needs to happen to them

  • If this happens to be incorrect we could investigate not sending the anycast prefixes over IXPs (which could be a large drawback) or using the NO-EXPORT BGP community (most likely not supported by many peers)

FYI i have had a good experience using no-export at IX's, i.e. it is mostly honoured and when it wasn't it was generally a mistake.

No-export is built into most BGP implementations, and has been for decades, so it should work almost universally. You'd need to strip the community on ingress to force the propagation of the prefix on most gear.

I do agree that we should maybe hold off on doing so though, see how it goes. Most IXP members are not transit ISPs and won't have downstreams. They *may* have internal networks with eBGP, where the no-export community won't help us.

Ask the provider to prepend 3x their AS to our prefixes when advertised in our other regions

Agree this is probably the right number of prepends.

Mentioned in SAL (#wikimedia-operations) [2021-10-13T13:59:41Z] <XioNoX> push prep-work for anycast tuning in ulsfo - T288843

Mentioned in SAL (#wikimedia-operations) [2021-10-19T08:03:03Z] <XioNoX> push prep-work for anycast tuning in ulsfo (try 2) - T288843

Change 728255 merged by jenkins-bot:

[operations/homer/public@master] Add transit BGP communities for anycast traffic engineering

https://gerrit.wikimedia.org/r/728255

Mentioned in SAL (#wikimedia-operations) [2021-10-19T08:40:57Z] <XioNoX> push prep-work for anycast tuning to all sites - T288843

Mentioned in SAL (#wikimedia-operations) [2021-10-19T09:03:04Z] <XioNoX> push anycast tuning to all Telia transit links - T288843

Example before/after for Telia in eqiad:

ayounsi@re0.cr2-eqiad> show route advertising-protocol bgp 80.239.132.225 

inet.0: 852341 destinations, 2745718 routes (847953 active, 0 holddown, 5930 hidden)
Restart Complete
  Prefix		  Nexthop	       MED     Lclpref    AS path
* 185.15.56.0/24          Self                                    I
* 185.71.138.0/24         Self                                    I
* 198.35.27.0/24          Self                                    I
* 198.73.209.0/24         Self                                    11820 ?
* 208.80.154.0/23         Self                                    I

{master}
ayounsi@re0.cr2-eqiad> show route advertising-protocol bgp 2001:2000:3080:a98::1 

inet6.0: 134059 destinations, 506804 routes (133430 active, 0 holddown, 1830 hidden)
Restart Complete
  Prefix		  Nexthop	       MED     Lclpref    AS path
* 2620:0:861::/48         Self                                    I
* 2620:62:c000::/48       Self                                    11820 ?

{master}
ayounsi@re0.cr2-eqiad> show route advertising-protocol bgp 80.239.132.225 185.71.138.0/24 extensive 

inet.0: 852377 destinations, 2745793 routes (847988 active, 1 holddown, 5927 hidden)
Restart Complete
* 185.71.138.0/24 (2 entries, 1 announced)
 BGP group Transit4 type External
     Nexthop: Self
     AS path: [65001] I  (LocalAgg)
     Communities: 14907:13
{master}
ayounsi@re0.cr2-eqiad> show route advertising-protocol bgp 80.239.132.225 185.71.138.0/24 extensive    

inet.0: 852517 destinations, 2746055 routes (848123 active, 2 holddown, 5931 hidden)
Restart Complete
* 185.71.138.0/24 (2 entries, 1 announced)
 BGP group Transit4 type External
     Nexthop: Self
     AS path: [65001] I  (LocalAgg)
     Communities: 1299:2003 1299:7003 14907:13
{master}
ayounsi@re0.cr2-eqiad> show route advertising-protocol bgp 2001:2000:3080:a98::1                         

inet6.0: 134062 destinations, 506739 routes (133391 active, 41 holddown, 1794 hidden)
Restart Complete
  Prefix		  Nexthop	       MED     Lclpref    AS path
* 2620:0:861::/48         Self                                    I
* 2620:62:c000::/48       Self                                    11820 ?

{master}
ayounsi@re0.cr2-eqiad> show route advertising-protocol bgp 80.239.132.225                              

inet.0: 852519 destinations, 2746068 routes (848104 active, 13 holddown, 5958 hidden)
Restart Complete
  Prefix		  Nexthop	       MED     Lclpref    AS path
* 185.15.56.0/24          Self                                    I
* 185.71.138.0/24         Self                                    I
* 198.35.27.0/24          Self                                    I
* 198.73.209.0/24         Self                                    11820 ?
* 208.80.154.0/23         Self                                    I

{master}
ayounsi@re0.cr2-eqiad> show route advertising-protocol bgp 80.239.132.225 208.80.154.0/23 extensive 

inet.0: 852517 destinations, 2746050 routes (848102 active, 2 holddown, 5959 hidden)
Restart Complete
* 208.80.154.0/23 (2 entries, 1 announced)
 BGP group Transit4 type External
     Nexthop: Self
     AS path: [65001] I  (LocalAgg)

Anycast prefixes have the extra communities, while the other prefixes are still there without extra communities.

Mentioned in SAL (#wikimedia-operations) [2021-10-19T11:56:13Z] <XioNoX> push anycast tuning to Tele2, Init7, DT transit links - T288843

Mentioned in SAL (#wikimedia-operations) [2021-10-19T12:12:49Z] <XioNoX> push anycast tuning to all Lumen and NTT transit links - T288843

Change 728256 merged by jenkins-bot:

[operations/homer/public@master] Configure transit specific outbound BGP communities

https://gerrit.wikimedia.org/r/728256

ayounsi claimed this task.

A good baseline has now been applied across most of our transits.
Further tuning will happen when sub-optimal routing is exposed through T283614 or similar.