Page MenuHomePhabricator

Summary of TLS work

Authored By
BBlack
Jun 11 2015, 2:58 PM
Size
3 KB
Referenced Files
None
Subscribers
None

Summary of TLS work

TLS termination on-cache:
* Previously, we terminated TLS on a small, separate TLS cluster at each PoP before forwarding to our frontend cache clusters.
* This was switched to TLS termination software directly on every frontend cache node, as the cache clusters were already much larger than the TLS clusters and had unused CPU cycles to spare for the task.
Cache Hardware expansion:
* Expanded/replaced hardware as appropriate
* Some of our oldest-generation caches still in service were too old to have the AES-NI instructions, critical for TLS crypto performance
* In general, some of our cache clusters were simply too small for the estimated CPU load of 100% HTTPS regardless
Software upgrades:
* Updated Linux distro to Debian Jessie (ahead of actual Jessie release)
* Better kernels
* Newer versions of critical packages like OpenSSL and Nginx to support various features and optimizations below
Low-level tuning:
* bnx2x hardware receive queue tuning
* Linux RSS/XPS-level tuning of network traffic (https://www.kernel.org/doc/Documentation/networking/scaling.txt)
* Thread counts, CPU pinning
* TCP stack tuning
SNI Certs:
* We host many domains, previously had a single large unified cert with 26x wildcards within
* Split to several smaller SNI certificates to reduce cert transfer sizes, for SNI-capable clients
* Still have unified cert for older client compatibility
Ciphersuite/Protocol Updates:
* Eliminated low/zero-security ciphers
* Support PFS ciphers with most clients which are capable
* SSLv3 disabled (back at time of POODLE incident - killed IE6/XP compatibility)
TLS record size:
* For our current stack, this is a fixed size
* Set small enough to always fit in one packet for best latency optimization for now
* Ideally should be dynamically upsized for larger transfers, future work to do here
TLS session resumption:
* Iterated on several ideas, much future work to do here
* For now, our interim simple-but-workable solution has been to keep RFC5077 tickets disabled and use local sessionid shm caches on each cache frontend
* We rely on our LVS routers using client IP hashing to keep clients sticky to the machine their sessionid is cached on
NPN + SPDY:
* NPN necessary for False Start latency optimization with some clients
* SPDY protocol brings better parallelism for loading several resources from the same domain
* Partial progress on updating our common URI patterns within a page load to take better advantage of SPDY coalescing, more to do here
OCSP Stapling:
* Bundles validation that cert is not revoked into the the main transaction between our server and the client
* Avoids browsers with OCSP validation from having to fetch separately from a 3rd party
* Helps avoid reliability issues with upstream 3rd party OCSP servers
HTTP Strict Transport Security:
* Avoids proxied downgrade attacks for clients that have contacted us securely in the past
* By implication, all access to a given domain is HTTPS-only when using HSTS
* Part of current rollout plan, but will be raising the age values conservatively as we go

File Metadata

Mime Type
text/plain; charset=utf-8
Storage Engine
blob
Storage Format
Raw Data
Storage Handle
171372
Default Alt Text
Summary of TLS work (3 KB)

Event Timeline