Page MenuHomePhabricator

Merge AS14907 with AS43821
Closed, ResolvedPublic

Description

Currently and for historical reasons, we operate two ASNs: 14907 (all US) and 43821 (Europe). These used to be disjoint networks, but since a while ago have been redundantly connected via private transport links.

The distinction doesn't make sense anymore and will actually become even more confusing with Singapore, which we have planned for it to be 14907.

I think we should just merge the two, deprecate 43821 and use 65003 as esams' confederation ASN. Another option would be to use 43821 for caching PoPs (ulsfo included) and 14907 for our core sites alone. In any case, we should be definitely migrating our communities to the 14907:* space so that we can carry BGP routes across our transatlantic links.

Timeline

Monday 25th

  • Standardize communities
[edit policy-options policy-statement BGP_sanitize_in then]
+     community delete AS14907:ALL;
-     community delete AS43821:ALL;
[edit policy-options]
+   community AS14907:ALL members "^14907:[0-9]+$";
-   community AS43821:ALL members "^43821:[0-9]+$";
[edit policy-options community AVOIDED_PATH]
-   members 43821:0;
+   members 14907:0;
[edit policy-options community PARTIAL_TRANSIT_ROUTE]
-   members 43821:5;
+   members 14907:5;
[edit policy-options community PEERING_ROUTE]
-   members 43821:3;
+   members 14907:3;
[edit policy-options community PEER_CUSTOMER]
-   members 43821:7;
+   members 14907:7;
[edit policy-options community PEER_INTERNAL]
-   members 43821:6;
+   members 14907:6;
[edit policy-options community PEER_PRIVATE_PEER]
-   members 43821:8;
+   members 14907:8;
[edit policy-options community PEER_PUBLIC_PEER]
-   members 43821:9;
+   members 14907:9;
[edit policy-options community PREFERRED_TRANSIT]
-   members 43821:10;
+   members 14907:10;
[edit policy-options community SELECTED_PATH]
-   members 43821:11;
+   members 14907:11;
[edit policy-options community TRANSIT_ROUTE]
-   members 43821:4;
+   members 14907:4;

Tuesday 26th to Oct 10th

  • Set "local-as 14907 alias" for sessions with Datahop, Tele2, TeliaEU. (causes brief BGP flap).
  • Verify sessions are properly established
  • Ask selected providers to reconfigure their BGP session using AS14907
    • Datahop
    • Tele2
    • TeliaEU
  • Verify sessions are properly established This is also to test that local-as works as expected before we use it with more peers (EDIT: see update in comments)
  • Notify transits and AMS-IX peers of the upcoming maintenance
  • Pre-provision new AMS-IX IPv6 address on cr2-esams

set interfaces ae2 unit 0 family inet6 address 2001:7f8:1::a501:4907:1/64 <-- To be obtained from AMS-IX

Tuesday Oct 10th - Noon PDT - 7pm UTC - 5h

  • Depool esams
  • Mute monitoring for 5h
  • set "local-as 43821" for IXP peers and transits (causes brief BGP flap)
cr*ams# set protocols bgp group IX4 local-as 43821 alias
cr*ams# set protocols bgp group IX6 local-as 43821 alias
cr*ams# set protocols bgp group Transit4 local-as 43821 alias
cr*ams# set protocols bgp group Transit6 local-as 43821 alias
  • Reconfigure BGP session between cr1-eqiad and esams (currently cr2-knams):
cr1-eqiad# set protocols bgp group Confed_esams neighbor 91.198.174.251 peer-as 65003
(make sure to use the interface IPs and not the routers loopback)
cr1-eqiad# set protocols bgp group Confed_esams local-address 91.198.174.250
(Will only establish once the confederation is configured)
cr1-eqiad# set protocols bgp group Confed_esams neighbor 91.198.174.250 peer-as 65001
cr1-eqiad# set protocols bgp group Confed_esams bfd-liveness-detection minimum-interval 300
cr2-knams# set protocols bgp group Confed_eqiad bfd-liveness-detection minimum-interval 300
  • Remove OSPF session to esams between cr1-eqiad and esams (currently cr2-knams)

(keep ospf area confined to its confed, reduces table size, this also removes the need of ospf link-protection if only 2 routers)

cr1-eqiad# delete protocols ospf area 0.0.0.0 interface xe-4/2/2.0
cr1-eqiad# set protocols ospf area 0.0.0.0 interface xe-4/2/2.0 passive
(keep interface as passive to not have gaps in for examples traceroutes)
cr2-knams# delete protocols ospf area 0.0.0.0 interface xe-1/1/0.0
cr2-knams# set protocols ospf area 0.0.0.0 interface xe-1/1/0.0 passive
  • Set local-as on sessions to pybal servers
cr*ams# set protocols bgp group PyBal local-as 43821 alias
  • Configure confederation, use sub-AS 65003 (unused but already configured on all the routers)
cr*ams# set routing-options autonomous-system 65003
cr*ams# set routing-options confederation 14907
cr*ams# set routing-options confederation members 65001
cr*ams# set routing-options confederation members 65002
cr*ams# set routing-options confederation members 65003
cr*ams# set routing-options confederation members 65004
cr*ams# set protocols bgp group iBGP peer-as 65003
  • renumber the last (unused) policy-option
cr*ams# set policy-options policy-statement BGP_prepend1_out term prepend then as-path-prepend 14907
  • Configure pybal to peer with AS14907
  • Remove local-as for sessions to pybal
cr*ams# delete protocols bgp group PyBal local-as
  • Verify pybal sessions UP
  • Remove local-as for Datahop, Tele2, TeliaEU
  • Verify that BGP sessions esams <-> reconfigured transit as well as cr1-eqiad are up.
  • Verify the routes advertised/received to/from Transits/IX/cr1-eqiad
cr2-knams> show route advertising-protocol bgp <peerIP>
cr2-knams> show route receive-protocol bgp  91.198.174.250
cr1-eqiad> show route receive-protocol bgp  91.198.174.251
  • Re-configure: cr2-eqiad<->esams OSPF to passive
cr2-eqiad# delete protocols ospf area 0.0.0.0 interface xe-4/1/3.0
cr2-eqiad# set protocols ospf area 0.0.0.0 interface xe-4/1/3.0 passive
(keep interface as passive to not have gaps in for examples traceroutes)
cr2-esams# delete protocols ospf area 0.0.0.0 interface xe-0/1/3.0
cr2-esams# set protocols ospf area 0.0.0.0 interface xe-0/1/3.0 passive
  • Re-configure: cr2-eqiad<->esams BGP
cr2-eqiad# set protocols bgp group Confed_esams neighbor 91.198.174.249 peer-as 65003
cr2-eqiad# set protocols bgp group Confed_esams local-address 91.198.174.248
cr2-eqiad# set protocols bgp group Confed_esams bfd-liveness-detection minimum-interval 300
cr2-esams# set protocols bgp group Confed_eqiad neighbor 91.198.174.248 peer-as 65001
cr2-esams# set protocols bgp group Confed_eqiad bfd-liveness-detection minimum-interval 300
  • Related to T83037 limit the propagation of internal routes within the as14907 confederation
set routing-options aggregate route 10.2.3.0/24 community no-export
  • Initiate AS# change on AMS-IX portal (should be effective within a few minutes
  • Update/run jnt/network-automation

Verifications

  • Verify the routes advertised/received to/from Transits/IX/transport/pybal
cr*> show route advertising-protocol bgp <peerIP>
cr*> show route receive-protocol bgp  <peerIP>
  • Verify connectivity between ulsfo/codfw (not direct neighbors) and esams (traceroute)
  • Verify no alarms, happy monitoring. Use looking glass to verify the correctness of the routes advertised to the DFZ.
  • Perform more intrusive testing: verify proper failover, document failover/outage time.
    • Restart each routers one after the other
    • Disable links to eqiad (one after the other)
    • Disable links between esams routers (one after the other)
  • Final "do we need to rollback?" point.
  • Repool esams. (can be done later, depending on how long we're okay to not have esams active).

End of maintenance window

Follow up work

  • Ask remaining of transits to change their BGP config to AS14907.
  • Coordinate change of AS# with AMS-IX peers.
  • Progressively remove "local-as" statements.
  • Update documentation.
  • Update peeringdb entry
  • Delete old AMS-IX IPv6 when no more traffic delete interfaces ae2 unit 0 family inet6 address 2001:7f8:1::a504:3821:1/64

Event Timeline

faidon created this task.Jun 13 2017, 10:32 PM
Restricted Application added a project: Operations. · View Herald TranscriptJun 13 2017, 10:32 PM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
faidon updated the task description. (Show Details)Jun 13 2017, 10:33 PM
BBlack added a subscriber: BBlack.Jun 14 2017, 1:22 AM

What are the real pros and cons on this? We could even go in the other direction and have a unique ASN per region/continent. How does the impact future anycasting? Note https://tools.ietf.org/html/rfc6382 talks about best practice for anycast being to have distinct ASNs per region, but I don't pretend to understand all the finer details and arguments in that RFC.

ema added a subscriber: ema.Jun 14 2017, 7:06 AM
elukey added a subscriber: elukey.Jun 14 2017, 7:16 AM
akosiaris updated the task description. (Show Details)Jun 14 2017, 11:41 AM

What are the real pros and cons on this? We could even go in the other direction and have a unique ASN per region/continent. How does the impact future anycasting? Note https://tools.ietf.org/html/rfc6382 talks about best practice for anycast being to have distinct ASNs per region, but I don't pretend to understand all the finer details and arguments in that RFC.

I only skimmed over that RFC, but I suspect it's talking about disjoint islands of anycasted services (e.g. DNS root servers). In our case, it's all one single network, with backhauling/transport links and an IGP running all across it, so it may not be very applicable to us.

In practical terms, merging our ASNs simplifies our network a little bit, as well as opens up the door for opportunities to route European-directed traffic from and to the core datacenters over our private datacenter links.

If you really think of it more broadly, we really operate two different networks with different requirements:

  • a "core" network, that hosts misc services and fundraising and WMCS etc. with a network that spans the globe and all of our regional PoPs. The routes of the core network could be announced by every PoP and in the other direction, the core network would be able to leverage the global network to route traffic to destinations over its private links.
  • a CDN network, every PoP of which is really supposed to serve local, regional traffic. It has separate address spaces per region, to be announced only regionally and never supposed to cross regional boundaries, but needs to leverage the core network for backhauling.

Building those two networks segregated like that would very complicated (e.g. renumbering eqiad/codfw's IP space into separate regional address spaces, VRFs or separate virtual routers for doing the same for outgoing traffic etc.) and a waste of time and resources for us at this point, I think.

The next best scenario IMHO is to just merge our communities and our ASNs for simplicity, and announce "customer" routes (e.g. WMF HQ, soon WMCS, possibly fundraising) across regional boundaries, while keeping 208.80.152.0/22 announced mostly in the US East, or potentially other regions only over transits that support BGP MED.

ayounsi added a comment.EditedJun 15 2017, 4:53 PM

How I understand it, the increased complexity of running two "networks" outweighs its advantages. And our customers are networks we manage and have control over.
A single ASNs (using confederation) vs. one ASN per region also makes peering relationships easier.
This RFC recommends the use of individual ASN for anycast to be able to make more fine grained routing decisions, which we can also do using communities.
We can also look at similar networks, CDNs, and such, that all operate a single ASN.

To keep the conversation moving, here is a coarse plan to move ESAMS to AS 14907 with little downtime, not tested in a lab, only thoughts. It also includes changes related to T167841 and T83037 to have esams as a model for the other pops, and then core confederations.
Routing policies have been reviews and should be compatible with those changes.

Ideally, to be done after cr2-knams is decommissioned to reduce the number of moving parts but this is not a blocker.

EDIT: Action plan moved to description

ayounsi moved this task from Backlog to Configuration on the netops board.Jun 27 2017, 2:38 PM
mark added a subscriber: mark.Jul 24 2017, 2:31 PM

I guess there's something to be said for using different ASNs for core vs CDN in the case of losing our transport connectivity to (one of) the CDN sites. We could then still tunnel this over the Internet (IP transits on both sides), which would be much harder (but not impossible) to do with one global ASN.

ayounsi renamed this task from Merge AS14907 with AS43281 to Merge AS14907 with AS43821.Sep 20 2017, 6:14 PM
ayounsi updated the task description. (Show Details)Sep 21 2017, 10:57 PM

Mentioned in SAL (#wikimedia-operations) [2017-09-25T19:58:14Z] <XioNoX> renumbering ams BGP communities - T167840

Mentioned in SAL (#wikimedia-operations) [2017-09-27T16:27:25Z] <XioNoX> setting local-as to selected transit BGP sessions - T167840

It seems like Junos' local-as feature isn't working as expected.
Global AS of 43821, remote side with peer-as 43821, and the local side with:
local-as 14907 -> BGP session doesn't establish
local-as 14907 private -> BGP session doesn't establish
local-as 14907 alias -> BGP session establish, AS path is 43821 I (as expected per the doc)

Global AS of 43821, remote side with peer-as 14907, and the local side with:
local-as 14907 -> BGP session establish, AS path is 14907 I, should be 43821 14907 I
local-as 14907 private -> BGP session establish, AS path is 14907 I, should be 43821 I
local-as 14907 alias -> BGP session establish, AS path is 14907 I (as expected per the doc)

The main advantage that this feature would have given us is to advertise prefixes from a single AS# during the transition period, and flip all the advertisements at once to 14907.
Instead, ams prefixes will originate from both AS# until all the providers are switches over to peer with 14907.
This makes things a tad more confusing, but doesn't impact connectivity.
On the upside it allows us to progressively test that all providers allow our ams prefixes to originate from the new AS#.

ayounsi updated the task description. (Show Details)Sep 28 2017, 9:53 PM
ayounsi updated the task description. (Show Details)Sep 29 2017, 4:02 PM
ayounsi updated the task description. (Show Details)Sep 29 2017, 4:29 PM

Change 383382 had a related patch set uploaded (by BBlack; owner: BBlack):
[operations/dns@master] Depool esams for expected blips during ASN renumbering

https://gerrit.wikimedia.org/r/383382

Change 383382 merged by Ayounsi:
[operations/dns@master] Depool esams for expected blips during ASN renumbering

https://gerrit.wikimedia.org/r/383382

Mentioned in SAL (#wikimedia-operations) [2017-10-10T17:32:37Z] <XioNoX> depooling esams from DNS for T167840

ayounsi added a comment.EditedOct 10 2017, 6:28 PM

After depooling esams the Telia link in eqiad started to saturate, I added the following terms to temporary ease out that link.

[edit policy-options as-path-group AVOID-PATHS]
+    as-path FT-Telia "1299 5511 .*";
+    as-path TelItalia-Telia "1299 6762 .*";

Confirmed working based on interface graph.

Mentioned in SAL (#wikimedia-operations) [2017-10-10T19:00:28Z] <XioNoX> starting the work for T167840

Mentioned in SAL (#wikimedia-operations) [2017-10-10T21:14:59Z] <bblack> esams repooling - T167840

Temporary AVOID-PATHS removed on cr2-eqiad.

The maintenance is now completed, some notes:

  • It was not clear that the plan included removing OSPF on the trans-atlantic link in favor of BGP

Need to discuss and familiarize it more with the team

  • The BGP filters were not allowing internal routes (10/8) to propagate to AMS, which caused a barrage of alerts.

This could have been avoided by comparing the routes learned via OSPF and the ones learned by BGP after migrating the 1st link, but before migrating the 2nd link

  • For the two reasons above, OSPF has been re-established

Communication has been sent to our peers and transits so they update their peer AS#. We can monitor and decide later on when to remove the local-as as well as old v6 IP.

ayounsi claimed this task.Oct 10 2017, 9:58 PM

It's not often that one of our primary cache PoPs ends up depooled for multiple hours. While obviously unintended, this was an interesting opportunity to measure the difference Esams makes for web performance.

https://grafana.wikimedia.org/dashboard/db/navigation-timing?from=1507482000000&to=1507680000000
https://grafana.wikimedia.org/dashboard/db/navigation-timing-by-platform?from=1507482000000&to=1507680000000
https://grafana.wikimedia.org/dashboard/db/navigation-timing-by-country?from=1507626000000&to=1507680000000

Navigation Timing: responseStartNavigationTiming: loadEvent
Time to first byte on page views. Basically 1 round of DNS+TCP+SSL and HTTP response headers. Time for page to completely finish loading (HTML, CSS, Images)
+100ms (Median of all) +0.3s (Median of all)
+115ms (Median of mobile)+0.2s (Median of mobile)
+70ms – +100ms (Median of United_Kingdom)
+90ms – +170ms (Median of Germany)
+68ms – +190ms (Median of Italy)
+60ms – +208ms (Median of Japan)
+83ms – +222ms (Median of Russia)
Krinkle moved this task from Inbox to Radar on the Performance-Team board.
Krinkle edited projects, added Performance-Team (Radar); removed Performance-Team.
Krinkle moved this task from Limbo to Watching on the Performance-Team (Radar) board.

As of today, 180 BGP sessions use the old AS# and 216 use the new one.

Timeline for decommissioning the old AS# (dates are flexible):

  • October 24th, send another round of emails of peers still peering with our old ASN, mentioning the following:
  • October 31st (1 week later), Send another round of emails, delete the old IPv6

This will cause the peers using our old v6 IP to drop, without impacting v4 peering, hopefully they will notice it and update their configuration

  • November 7th (1 week later), dend another round of emails, remove the "local-as" statement

Will cause all the v4 peers using our old AS# to drop
This can be postponed if the number of peers is still too high, but if they haven't reconfigured their session after 3 weeks, 4 direct emails and a down v6 session I don't think more time will help.

faidon added a comment.EditedOct 20 2017, 1:20 PM

Sounds fine to me. Before we resolve this task, let's not forget that we'll need to cleanup a) our RIPE objects by remove the old route(6) ones b) our RPKI ROAs.

Did AS1126:

maartend@vancis-asd01-r01> show bgp summary | match 14907
80.249.209.176 14907 6 9 0 0 1:49 Establ
2001:7f8:1::a501:4907:1 14907 6 6 0 0 1:45 Establ

More info at http://www.as1126.net/ . Thanks for peering with us.

I was updating RIPE db and I noticed some of the records are still lagging.

For bonus points you can also add the route-servers, see https://apps.db.ripe.net/search/query.html?searchtext=AS6777&bflag=true&source=RIPE#resultsAnchor

Mentioned in SAL (#wikimedia-operations) [2017-10-31T21:00:22Z] <XioNoX> removing old AMS-IX IPv6 - T167840

Mentioned in SAL (#wikimedia-operations) [2017-12-19T21:51:18Z] <XioNoX> removing local-as AS43821 from ams transits - T167840

ayounsi closed this task as Resolved.Mar 12 2018, 2:35 PM

This is done, all peers are up with proper new ASN.
AS43821 is not in use anywhere in esams.