Currently and for historical reasons, we operate two ASNs: 14907 (all US) and 43821 (Europe). These used to be disjoint networks, but since a while ago have been redundantly connected via private transport links.
The distinction doesn't make sense anymore and will actually become even more confusing with Singapore, which we have planned for it to be 14907.
I think we should just merge the two, deprecate 43821 and use 65003 as esams' confederation ASN. Another option would be to use 43821 for caching PoPs (ulsfo included) and 14907 for our core sites alone. In any case, we should be definitely migrating our communities to the 14907:* space so that we can carry BGP routes across our transatlantic links.
Timeline
Monday 25th
- Standardize communities
[edit policy-options policy-statement BGP_sanitize_in then] + community delete AS14907:ALL; - community delete AS43821:ALL; [edit policy-options] + community AS14907:ALL members "^14907:[0-9]+$"; - community AS43821:ALL members "^43821:[0-9]+$"; [edit policy-options community AVOIDED_PATH] - members 43821:0; + members 14907:0; [edit policy-options community PARTIAL_TRANSIT_ROUTE] - members 43821:5; + members 14907:5; [edit policy-options community PEERING_ROUTE] - members 43821:3; + members 14907:3; [edit policy-options community PEER_CUSTOMER] - members 43821:7; + members 14907:7; [edit policy-options community PEER_INTERNAL] - members 43821:6; + members 14907:6; [edit policy-options community PEER_PRIVATE_PEER] - members 43821:8; + members 14907:8; [edit policy-options community PEER_PUBLIC_PEER] - members 43821:9; + members 14907:9; [edit policy-options community PREFERRED_TRANSIT] - members 43821:10; + members 14907:10; [edit policy-options community SELECTED_PATH] - members 43821:11; + members 14907:11; [edit policy-options community TRANSIT_ROUTE] - members 43821:4; + members 14907:4;
Tuesday 26th to Oct 10th
- Set "local-as 14907 alias" for sessions with Datahop, Tele2, TeliaEU. (causes brief BGP flap).
- Verify sessions are properly established
- Ask selected providers to reconfigure their BGP session using AS14907
- Datahop
- Tele2
- TeliaEU
- Verify sessions are properly established This is also to test that local-as works as expected before we use it with more peers (EDIT: see update in comments)
- Notify transits and AMS-IX peers of the upcoming maintenance
- Pre-provision new AMS-IX IPv6 address on cr2-esams
set interfaces ae2 unit 0 family inet6 address 2001:7f8:1::a501:4907:1/64 <-- To be obtained from AMS-IX
Tuesday Oct 10th - Noon PDT - 7pm UTC - 5h
- Depool esams
- Mute monitoring for 5h
- set "local-as 43821" for IXP peers and transits (causes brief BGP flap)
cr*ams# set protocols bgp group IX4 local-as 43821 alias cr*ams# set protocols bgp group IX6 local-as 43821 alias cr*ams# set protocols bgp group Transit4 local-as 43821 alias cr*ams# set protocols bgp group Transit6 local-as 43821 alias
- Reconfigure BGP session between cr1-eqiad and esams (currently cr2-knams):
cr1-eqiad# set protocols bgp group Confed_esams neighbor 91.198.174.251 peer-as 65003 (make sure to use the interface IPs and not the routers loopback) cr1-eqiad# set protocols bgp group Confed_esams local-address 91.198.174.250 (Will only establish once the confederation is configured) cr1-eqiad# set protocols bgp group Confed_esams neighbor 91.198.174.250 peer-as 65001 cr1-eqiad# set protocols bgp group Confed_esams bfd-liveness-detection minimum-interval 300 cr2-knams# set protocols bgp group Confed_eqiad bfd-liveness-detection minimum-interval 300
- Remove OSPF session to esams between cr1-eqiad and esams (currently cr2-knams)
(keep ospf area confined to its confed, reduces table size, this also removes the need of ospf link-protection if only 2 routers)
cr1-eqiad# delete protocols ospf area 0.0.0.0 interface xe-4/2/2.0 cr1-eqiad# set protocols ospf area 0.0.0.0 interface xe-4/2/2.0 passive (keep interface as passive to not have gaps in for examples traceroutes) cr2-knams# delete protocols ospf area 0.0.0.0 interface xe-1/1/0.0 cr2-knams# set protocols ospf area 0.0.0.0 interface xe-1/1/0.0 passive
- Set local-as on sessions to pybal servers
cr*ams# set protocols bgp group PyBal local-as 43821 alias
- Configure confederation, use sub-AS 65003 (unused but already configured on all the routers)
cr*ams# set routing-options autonomous-system 65003 cr*ams# set routing-options confederation 14907 cr*ams# set routing-options confederation members 65001 cr*ams# set routing-options confederation members 65002 cr*ams# set routing-options confederation members 65003 cr*ams# set routing-options confederation members 65004 cr*ams# set protocols bgp group iBGP peer-as 65003
- renumber the last (unused) policy-option
cr*ams# set policy-options policy-statement BGP_prepend1_out term prepend then as-path-prepend 14907
- Configure pybal to peer with AS14907
- Remove local-as for sessions to pybal
cr*ams# delete protocols bgp group PyBal local-as
- Verify pybal sessions UP
- Remove local-as for Datahop, Tele2, TeliaEU
- Verify that BGP sessions esams <-> reconfigured transit as well as cr1-eqiad are up.
- Verify the routes advertised/received to/from Transits/IX/cr1-eqiad
cr2-knams> show route advertising-protocol bgp <peerIP> cr2-knams> show route receive-protocol bgp 91.198.174.250 cr1-eqiad> show route receive-protocol bgp 91.198.174.251
- Re-configure: cr2-eqiad<->esams OSPF to passive
cr2-eqiad# delete protocols ospf area 0.0.0.0 interface xe-4/1/3.0 cr2-eqiad# set protocols ospf area 0.0.0.0 interface xe-4/1/3.0 passive (keep interface as passive to not have gaps in for examples traceroutes) cr2-esams# delete protocols ospf area 0.0.0.0 interface xe-0/1/3.0 cr2-esams# set protocols ospf area 0.0.0.0 interface xe-0/1/3.0 passive
- Re-configure: cr2-eqiad<->esams BGP
cr2-eqiad# set protocols bgp group Confed_esams neighbor 91.198.174.249 peer-as 65003 cr2-eqiad# set protocols bgp group Confed_esams local-address 91.198.174.248 cr2-eqiad# set protocols bgp group Confed_esams bfd-liveness-detection minimum-interval 300 cr2-esams# set protocols bgp group Confed_eqiad neighbor 91.198.174.248 peer-as 65001 cr2-esams# set protocols bgp group Confed_eqiad bfd-liveness-detection minimum-interval 300
- Related to T83037 limit the propagation of internal routes within the as14907 confederation
set routing-options aggregate route 10.2.3.0/24 community no-export
- Initiate AS# change on AMS-IX portal (should be effective within a few minutes
- Update/run jnt/network-automation
Verifications
- Verify the routes advertised/received to/from Transits/IX/transport/pybal
cr*> show route advertising-protocol bgp <peerIP> cr*> show route receive-protocol bgp <peerIP>
- Verify connectivity between ulsfo/codfw (not direct neighbors) and esams (traceroute)
- Verify no alarms, happy monitoring. Use looking glass to verify the correctness of the routes advertised to the DFZ.
- Perform more intrusive testing: verify proper failover, document failover/outage time.
- Restart each routers one after the other
- Disable links to eqiad (one after the other)
- Disable links between esams routers (one after the other)
- Final "do we need to rollback?" point.
- Repool esams. (can be done later, depending on how long we're okay to not have esams active).
End of maintenance window
Follow up work
- Ask remaining of transits to change their BGP config to AS14907.
- Coordinate change of AS# with AMS-IX peers.
- Progressively remove "local-as" statements.
- Update documentation.
- Update peeringdb entry
- Delete old AMS-IX IPv6 when no more traffic delete interfaces ae2 unit 0 family inet6 address 2001:7f8:1::a504:3821:1/64