Page MenuHomePhabricator
Paste P769

Summary of TLS work
ActivePublic

Authored by BBlack on Jun 11 2015, 12:53 PM.
TLS termination on-cache:
* Previously, we terminated TLS on a small, separate TLS cluster at each PoP before forwarding to our frontend cache clusters.
* This was switched to TLS termination software directly on every frontend cache node, as the cache clusters were already much larger than the TLS clusters and had unused CPU cycles to spare for the task.
Cache Hardware expansion:
* Expanded/replaced hardware as appropriate
* Some of our oldest-generation caches still in service were too old to have the AES-NI instructions, critical for TLS crypto performance
* In general, some of our cache clusters were simply too small for the estimated CPU load of 100% HTTPS regardless
Software upgrades:
* Updated Linux distro to Debian Jessie (ahead of actual Jessie release)
* Better kernels
* Newer versions of critical packages like OpenSSL and Nginx to support various features and optimizations below
Low-level tuning:
* bnx2x hardware receive queue tuning
* Linux RSS/XPS-level tuning of network traffic (https://www.kernel.org/doc/Documentation/networking/scaling.txt)
* Thread counts, CPU pinning
* TCP stack tuning
SNI Certs:
* We host many domains, previously had a single large unified cert with 26x wildcards within
* Split to several smaller SNI certificates to reduce cert transfer sizes, for SNI-capable clients
* Still have unified cert for older client compatibility
Ciphersuite/Protocol Updates:
* Eliminated low/zero-security ciphers
* Support PFS ciphers with most clients which are capable
* SSLv3 disabled (back at time of POODLE incident - killed IE6/XP compatibility)
TLS record size:
* For our current stack, this is a fixed size
* Set small enough to always fit in one packet for best latency optimization for now
* Ideally should be dynamically upsized for larger transfers, future work to do here
TLS session resumption:
* Iterated on several ideas, much future work to do here
* For now, our interim simple-but-workable solution has been to keep RFC5077 tickets disabled and use local sessionid shm caches on each cache frontend
* We rely on our LVS routers using client IP hashing to keep clients sticky to the machine their sessionid is cached on
NPN + SPDY:
* NPN necessary for False Start latency optimization with some clients
* SPDY protocol brings better parallelism for loading several resources from the same domain
* Partial progress on updating our common URI patterns within a page load to take better advantage of SPDY coalescing, more to do here
OCSP Stapling:
* Bundles validation that cert is not revoked into the the main transaction between our server and the client
* Avoids browsers with OCSP validation from having to fetch separately from a 3rd party
* Helps avoid reliability issues with upstream 3rd party OCSP servers
HTTP Strict Transport Security:
* Avoids proxied downgrade attacks for clients that have contacted us securely in the past
* By implication, all access to a given domain is HTTPS-only when using HSTS
* Part of current rollout plan, but will be raising the age values conservatively as we go
Preparations for transition:
* Load testing / estimation process
* Analytics work on HTTPS latency/failure differentials, sampled from live users
* ...
Future Directions:
* ECDSA Certificates
* HSTS Preloading
* HPKP
* HTTP/2 + ALPN

Event Timeline

BBlack edited the content of this paste. (Show Details)Jun 11 2015, 12:53 PM
BBlack changed the title of this paste from untitled to Summary of TLS work.
BBlack updated the paste's language from autodetect to autodetect.
BBlack changed the visibility from "Public (No Login Required)" to "Security (Project)".
BBlack changed the edit policy from "All Users" to "Security (Project)".
BBlack added a project: acl*sre-team.
BBlack changed the visibility from "Security (Project)" to "Public (No Login Required)".Jun 11 2015, 2:52 PM
BBlack changed the edit policy from "Security (Project)" to "BBlack (Brandon Black)".
BBlack edited the content of this paste. (Show Details)Jun 11 2015, 2:58 PM
BBlack edited the content of this paste. (Show Details)
BBlack edited the content of this paste. (Show Details)Jun 11 2015, 3:05 PM
BBlack changed the edit policy from "BBlack (Brandon Black)" to "acl*sre-team (Project)".
greg awarded a token.Jun 12 2015, 11:06 PM
greg added a subscriber: greg.

Re SPDY, it might also be worth mentioning the big benefit of actually enabling low-overhead pipelining, which is a big boon for APIs in particular. It lets us move from custom bundle API end points with cache fragmentation to finer-grained but well-cacheable requests, simplifying and speeding up things in the process.