Lately there's an extra RTT on our initial TLS handshakes on the cache terminators. Likely this is a result of a chance to OpenSSL (1.0->1.1) and/or nginx, and/or our configuration. This was originally reported by @Gwicke as a regression in webpagtest results comparing production to the labs proxy and to production in the past. What I know from my own testing is this:
1. I can reproduce it both with Chrome and curl on Linux, but the easiest thing to test with is `openssl s_client`.
2. It's sensitive to the data size of the server's handshake response (which is largely determined by certificate chain and OCSP staple sizes), and the critical value is somewhere in the 4K-ish ballpark as reported by `openssl s_client`'s `SSL handshake has read N bytes`.
3. You can manipulate the handshake size with curl (for testing purposes) by changing the client's `-cipher` (ECDHE vs DHE vs non-FS, ECDSA vs RSA for the auth alg, etc) and by asking for stapling with `-status` (or not).
4. Adding 1000ms of artificial network delay on your local machine makes it far easier to test for the extra RTT, as an extra second doesn't get lost in various other noise. I've been adding it to my wifi with: `tc qdisc add dev wlp2s0 root netem delay 1000ms` (and then s/add/del/ to revert it).
These are some test results from my first round of testing, against a "normal" software stack and config on cp1008:
| Server Bytes | Extra RTT | Cipher | Stapling
| ---|---|---|---
| 3238 | No | ECDHE-ECDSA-AES128-GCM-SHA256 | No
| 3270 | No | ECDHE-ECDSA-AES128-SHA | No
| 3344 | No | AES128-SHA | No
| 3625 | No | ECDHE-RSA-AES128-GCM-SHA256 | No
| 3657 | No | ECDHE-RSA-AES128-SHA | No
| 4132 | No | DHE-RSA-AES128-SHA | No
| 4867 | Yes | ECDHE-ECDSA-AES128-GCM-SHA256 | Yes
| 4899 | Yes | ECDHE-ECDSA-AES128-SHA | Yes
| 4974 | Yes | AES128-SHA | Yes
| 5255 | Yes | ECDHE-RSA-AES128-GCM-SHA256 | Yes
| 5287 | Yes | ECDHE-RSA-AES128-SHA | Yes
| 5762 | Yes | DHE-RSA-AES128-SHA | Yes
Basically the extra RTT is always when Stapling is added, but there's also a 735 byte gap in the sizes I can test there (stapling is relatively large!), so it could also be that the critical size just happens to land in that window between 4132-4867 (again, as reported by s_client). The extra-RTT boundary being in that range also sounds an awful lot like a lack of IW10 (which would show a limit of 4380).
In any case, the first thing to see is whether we can artificially make a smaller stapled response to prove that the stapling feature doesn't trigger the issue regardless of size. So I re-tested the smallest result above but without its intermediate certificate sent to reduce the size:
| 3737 | No | ECDHE-ECDSA-AES128-GCM-SHA256 | Yes | (intermediate cert not sent, to prove we can staple without extra RTT at all)
The problem looks a lot like https://trac.nginx.org/nginx/ticket/413 , which was fixed years ago. It's possible the fix doesn't work with OpenSSL-1.1 (it does like pretty hacky). After going down a bunch of other pointless avenues, I recompiled libssl itself to change the default BIO buffer size from 4K to 8K, and that fixed the extra RTT in all cases. I don't think that's the correct or ideal solution here, but it may be what we have to do for now just to get our handshake times back down.