Page MenuHomePhabricator

High number of failed inbound TFO connections in esams Mon-Fri
Closed, DeclinedPublic

Description

We deployed TCP Fast Open across our tlsproxies on Jun 24 2016.

The number of failed inbound TFO connections seems fairly low on all DCs except for esams. Interestingly, esams errors happen mostly from Monday to Friday, not during the weekend .

failed-inbound-tfo-esams.png (873×1 px, 115 KB)

I started looking into the IPs causing TCPFastOpenPassiveFail to grow with systemtap on cp3043 and cp3042. On both machines, one specific IP was the biggest offender, causing the vast majority of the failures.

hosttfo attemptssuccessfulfailedfailures caused by one IP
cp3043202975612731045
cp3042149716613311058

The two IPs causing the vast majority of TCPFastOpenPassiveFails are both from AS3215 (France telecom/orange).

We want to find out whether the issue is AS3215-specific or not, and possibly fix it.

Event Timeline

ema triaged this task as Medium priority.Aug 22 2016, 3:45 PM

Perhaps this is a mobile carrier doing CGNAT that constantly flips souce IPs for TCP traffic from the same phones, thus constantly breaking otherwise-valid received TFO cookies for the clients?

From https://www1.icsi.berkeley.edu/~barath/papers/tfo-conext11.pdf section 4.3:

some carrier-grade NAT configurations use different public IP addresses for new TCP connections from the same client. In such cases, the TFO cookies cached by the client would not be valid and the server would fall back on a regular 3WHS and reject any data in the SYN packet

So yeah, CGNAT would be an explanation.

No movement in 4 years. If there are new/ongoing TFO issues, someone should make a new ticket about them!