Page MenuHomePhabricator

cr1-eqiad -> Charter/AS7843 connectivity is broken
Closed, ResolvedPublic

Description

I was notified that a user in #debian-mirrors reported a connectivity issue to our ftp.us.debian.org mirror (2620:0:861:1:208:80:154:15 aka sodium), for "about a week now".

They provided an mtr but not their IP address. I did not catch them in time to share with them the reporting a connectivity issue page.

However, the information that we have is already enough to pinpoint at least one issue:

The route for the first hop is 2603:6080::/28 and for the subsequent four, 2606:a000::/32, so both fairly broad and with that customer of theirs is probably in there as well.

Both of those routes have 2001:504:0:2::7843:1, as the next-hop, i.e. Charter's router on the Equinix IXP. The routes are learned through the peering that cr2-eqiad (and only cr2-eqiad) has with that IP. So for cr1-eqiad, the source of the route is cr2-eqiad; the 2001:504:0:2::/64 destination, however, is direct, through its own IXP port, xe-3/0/6.

But:

faidon@re0.cr2-eqiad> show ipv6 neighbors |match 2001:504:0:2::7843:1                         
2001:504:0:2::7843:1         2e:21:31:00:2f:9c  reachable   4   yes no      xe-3/3/3.0  
faidon@re0.cr1-eqiad> show ipv6 neighbors |match 2001:504:0:2::7843:1                  
2001:504:0:2::7843:1         none               unreachable 1   no  no      xe-3/0/6.0

sodium's active VRRP gateway is cr1-eqiad.

The report was IPv6-specific and did not mention IPv4. However:

faidon@re0.cr1-eqiad> ping count 2 206.126.238.34 
PING 206.126.238.34 (206.126.238.34): 56 data bytes

--- 206.126.238.34 ping statistics ---
2 packets transmitted, 0 packets received, 100% packet loss

faidon@re0.cr2-eqiad> ping count 2 206.126.238.34    
PING 206.126.238.34 (206.126.238.34): 56 data bytes
64 bytes from 206.126.238.34: icmp_seq=0 ttl=64 time=1.308 ms
64 bytes from 206.126.238.34: icmp_seq=1 ttl=64 time=0.828 ms

--- 206.126.238.34 ping statistics ---
2 packets transmitted, 2 packets received, 0% packet loss
round-trip min/avg/max/stddev = 0.828/1.068/1.308/0.240 ms

(206.126.238.34 being 7843's IPv4 on the IXP)

My guess would be that this is Charter filtering traffic on their IXP port to only routers they have peerings with, for security/anti-DDoS reasons.

I'm not sure if this is because we gave them our router's MAC address when we peered, or if they're doing that by means of ARP/NDP with the IP of the router they peer with. More broadly, our setup right now is "cr2-eqiad has the peering but cr1-eqiad can and will send you traffic", which is probably unusual and breaks network ingress assumptions that exist out there.

Event Timeline

faidon triaged this task as High priority.Sun, Nov 14, 11:35 AM
faidon created this task.

Mentioned in SAL (#wikimedia-operations) [2021-11-14T11:48:42Z] <paravoid> disable cr1-eqiad:xe-3/0/6 (IXP port) to mitigate T295650

I disabled the Equinix IXP port on cr1-eqiad, xe-3/0/6, just a few moments ago, in order to mitigate this issue. Checked with @ayounsi on IRC first, who is now aware of this task.

Connectivity from sodium to hops 1-5 of their mtr seems to have been restored (previously "address unreachable").

Update 13:50 UTC: the reporting user confirmed over IRC that connectivity has been restored and can now access our Debian mirror.

There is definitely a noticeable difference in traffic patterns from Nov 4th or so:

Screenshot 2021-11-14 at 13-55-31 Turnilo (1 29 0).png (684×992 px, 57 KB)

IPv4 seems to have the opposite pattern (ramping up on Nov 4th), and therefore with IPv4+IPv6 being seemingly unaffected/"normal". This is probably Happy Eyeballs, i.e. users falling back to IPv4 when IPv6 was unreachable (at a performance penalty). So IPv4 traffic was likely unaffected, despite the ping evidence in the task above, for reasons that are not entirely clear to me yet.

ayounsi added a subscriber: cmooney.

Thanks for taking care of it. Proper fix is most likely T295672.

Change 738873 had a related patch set uploaded (by Cathal Mooney; author: Cathal Mooney):

[operations/puppet@production] Updating MAC address for install server DHCP config for rpki1001 as it is being rebuilt to provide more disk space and has a new MAC.

https://gerrit.wikimedia.org/r/738873

Change 738873 merged by Cathal Mooney:

[operations/puppet@production] Change MAC address in DHCP config for rpki1001

https://gerrit.wikimedia.org/r/738873

Please ignore the above, unrelated CRs. I pasted the wrong task ID when doing the commit.

My guess would be that this is Charter filtering traffic on their IXP port to only routers they have peerings with, for security/anti-DDoS reasons.

I'm not sure if this is because we gave them our router's MAC address when we peered, or if they're doing that by means of ARP/NDP with the IP of the router they peer with. More broadly, our setup right now is "cr2-eqiad has the peering but cr1-eqiad can and will send you traffic", which is probably unusual and breaks network ingress assumptions that exist out there.

100% right I think. Did not expect to see that, as you allude to configuring this filtering based on MAC address learnt in ARP/ND would not be trivial, but perhaps some vendor does this automatically, or with urpf enabled or something.

Either way I think the best solution is to enable the "next-hop self" option as @ayounsi says. Should be relatively straightforward to implement and avoid any such edge-cases caused by both CRs being connected to the same peering LAN in future. Will get that rolled out as part of the dedicated task, then we can revisit this one to re-enable the second IXP and test status of connectivity to Charter.

Mentioned in SAL (#wikimedia-operations) [2021-11-18T10:00:06Z] <topranks> Re-enabling Equinix IXP port on cr1-eqiad following iBGP changes to address T295650

The next-hop self policy has been applied on cr1-eqiad and cr2-eqiad, in the Confed_eqiad group, to address this issue.

cr2-eqiad is now announcing the next-hop for all routes learnt on it's Equinix IXP port to cr1-eqiad with it's own loopback IP as next-hop. For instance:

cmooney@re0.cr1-eqiad> show route table inet6.0 receive-protocol bgp 2620:0:861:ffff::2 aspath-regex ".* 11426$" 

inet6.0: 138571 destinations, 751635 routes (138238 active, 1 holddown, 2076 hidden)
Restart Complete
  Prefix		  Nexthop	       MED     Lclpref    AS path
* 2600:5800::/32          2620:0:861:ffff::2   0       250        7843 11426 ?
* 2603:6080::/28          2620:0:861:ffff::2   0       250        7843 11426 ?
* 2603:60bd::/32          2620:0:861:ffff::2   0       250        7843 11426 ?
* 2603:90bb::/32          2620:0:861:ffff::2   0       250        7843 11426 ?
* 2606:a000::/32          2620:0:861:ffff::2   0       250        7843 11426 ?

This ensures that, even now that cr1-eqiad's local port on the Equnix IXP LAN is up, the route for these prefixes on cr1-eqiad still goes via cr2:

cmooney@re0.cr1-eqiad> show route protocol bgp 2603:6080::/28 

inet6.0: 138569 destinations, 636889 routes (138253 active, 0 holddown, 318 hidden)
Restart Complete
+ = Active Route, - = Last Active, * = Both

2603:6080::/28     *[BGP/170] 01:09:59, MED 0, localpref 250, from 2620:0:861:ffff::2
                      AS path: 7843 11426 ?, validation-state: valid
                    > to fe80::8618:88ff:fe0d:dfc5 via ae0.0
                      to fe80::ee38:7300:ce8:9c56 via xe-4/2/2.12
cmooney@re0.cr1-eqiad> show interfaces descriptions | match ae0      
ae0             up    up   cr2-eqiad:ae0

Picking a random Charter IPv4 address I know pings back you can see the traffic routing out via cr2, and ping is successful:

cmooney@re0.cr1-eqiad> ping 24.25.12.99 source 208.80.154.196 
PING 24.25.12.99 (24.25.12.99): 56 data bytes
64 bytes from 24.25.12.99: icmp_seq=0 ttl=248 time=8.048 ms
64 bytes from 24.25.12.99: icmp_seq=1 ttl=248 time=7.496 ms
64 bytes from 24.25.12.99: icmp_seq=2 ttl=248 time=7.675 ms
cmooney@re0.cr1-eqiad> traceroute 24.25.12.99 source 208.80.154.196 no-resolve wait 1
traceroute to 24.25.12.99 (24.25.12.99) from 208.80.154.196, 30 hops max, 52 byte packets
 1  208.80.154.194  0.440 ms  0.505 ms  0.397 ms
 2  206.126.238.34  0.881 ms  0.720 ms  1.159 ms
 3  209.18.43.58  1.184 ms 66.109.5.116  1.181 ms  1.103 ms
 4  66.109.6.225  7.832 ms  9.061 ms 66.109.6.81  8.243 ms
 5  24.93.64.51  8.391 ms  7.708 ms  7.626 ms
 6  * * *
 7  * * *
 8  * * *
cmooney@re0.cr1-eqiad> show route 208.80.154.194 

inet.0: 860569 destinations, 3781957 routes (860097 active, 0 holddown, 2585 hidden)
Restart Complete
+ = Active Route, - = Last Active, * = Both

208.80.154.192/30  *[Direct/0] 63w0d 23:53:47
                    > via ae0.0