Page MenuHomePhabricator

Strong cipher preference ordering for cache terminators
Closed, ResolvedPublic

Description

To be clear on scope: this is about the relative ordering of the 3x AEAD options AES128-GCM, AES256-GCM, and CHACHA20-POLY1305 when all are used with ECDHE -based forward secrecy. These are our top 3 ciphers in terms of modern strength, but a lot of subtlety goes into choices between them, and there's a lot of confusion out there on the Internet about this issue.

I'll start by covering all of the out-of-scope ciphers: We obviously generally order FS ciphers before non-FS ciphers, and ECDHE before DHE within FS sets. We also always prefer the 3 strong ciphers under discussion here over any other alternatives. We only use 128-bit AES in these lesser options, as any client negotiating them has far bigger problems than AES bit-strength issues, and is likely to be a legacy/older devices where crypto CPU utilization matters the most. All of these out-of-scope ciphers are considered legacy and fundamentally-broken, and we only continue to support them because to do otherwise would cut off significant fractions of our real-world users, and these ciphers *do* provide some level of protection against lesser adversaries.

For the 3 in-scope strong ciphers: The great AES128-vs-AES256 debate was covered in depth in a semi-related ticket earlier (see here and beyond: T131908#2545144 ). The bottom line is that in purely security terms, our preference should probably be [chapoly|aes256] are equally-stronger than aes128 (and yes I know you've read differently, and I've even said differently in the past, but please read the earlier ticket link in detail).

On the performance front: clients implementing the strong suites do not universally support AES acceleration at the hardware level. This is especially true for modern browsers running on ancient hardware. For non-AES-accelerating clients, chapoly is both the strongest and the fastest choice, but it's also the newest and least-deployed of the three. For AES-accelerating clients, aes128 is faster than 256 and chapoly is significantly slower than both, but the performance differential for aes256-vs-aes128 is small and we really should just prefer aes256 on them. The real problem is we can't easily identify all AES-accelerating clients at ClientHello time.

We do have one signal about that: Chrome detects AES accleration and re-orders AES/ChaPoly in the client's preference list based on that.

In an ideal world, we'd basically like to prefer AES256 to AES128 for AES-accelerating clients (or possibly all clients), and prefer ChaPoly to AES for clients that implement it but lack AES-NI. Chrom(e|ium) makes this easy for us with their client preference re-ordering. Firefox is currently a problem, though. FF47's current state is that it supports ChaPoly, but has a static preference list which orders AES128 ahead of ChaPoly and doesn't even implement AES256. There are open tickets to address both issues in NSS/Firefox, and it seems at least initially that AES256 will be behind ChaPoly when that lands in FF49 (aes128->chapoly->aes256). I have no idea how their AES acceleration re-ordering will affect their future cipher lists after that.

I think even for non-accelerated clients, though, the AES256 (vs 128) perf hit is relatively-minor. It's there, but it may not be worth significant effort to pursue it.

We have a few basic options at this point in time:

  1. Stick with the ciphersuites and prefhacks we have. These choose ChaPoly iff it's ahead of the relevant AESGCM options in the client's list, and prefer AES128 to AES256 always, which is what most other major TLS sites are doing now.
  2. Stick with the ChaPoly prefhack we have now, and universally put AES256 ahead of AES128. This probably slightly slows down some clients (those new enough to support strong ciphers at all, but old enough to have a very slow CPU that lacks AES instructions), but it's probably a negligible slowdown in the net of our perf graphs.
  3. Use a more-complex prefhack similar to https://gerrit.wikimedia.org/r/#/c/306659/2/debian/patches/chapoly_aes256gcm_prefhacks.patch . This allows us to leave AES128 preferred in the general case, but bump the AES256 priority when the client cipher list indicates AES acceleration by including ChaPoly but de-prioritizing it. If all ChaPoly-capable clients acted the way Chrome does, this would be an ideal solution for the medium-term, looking towards eventually preferring AES256 outright as in (2) later.

Firefox doesn't really work right with any approach we've got, and their future directions are unknowns. I think we can safely assume they'll bump ChaPoly above both AES options for unaccelerated clients when/if NSS starts re-ordering based on acceleration. We probably can't assume they'll rank AES256 above ChaPoly for accelerating clients, though, as they still seem to be in an anti-AES256 mindset over there. The current complex pref hacks patch linked above would choose AES128 for all relevant Firefoxen today, and would choose AES256 for them all when that lands without acceleration re-ordering in FF49.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Link to mozilla NSS bug on doing AES-NI pref hacks on the client side (no real info there yet): https://bugzilla.mozilla.org/show_bug.cgi?id=1279584

My current thinking on this at the moment is that it's probably not worth the more-complex prefhacks. We should probably stall on seeing a little more broad adoption of RFC-Chapoly combined with AES-NI prioritization on the client side, which means just waiting more time on the Chrome front (for more outdated Chrome installs to update), and waiting for FF to make a release with the above NSS issue resolved and see at least some moderate deployment. Then after that point, put AES256 above AES128 in our strong suite for ECDHE-(ECDSA|RSA). Truly-legacy clients that don't implement strong ciphers would still be on 128 in the mid/compat lists for perf where it matters most.

I've found one counterpoint recently, making a mathematically-backed-up claim that we don't have to worry about AES-128 batch attacks so much in the specific case of GCM: https://www.ietf.org/mail-archive/web/tls/current/msg20102.html . I don't find any other sources on this, though.

Circling back to the broader plans and ideas here:

FF/NSS ChaPoly/AES pref hacks

Firefox/NSS don't seem to be making any quick progress on ChaPoly prioritization based on CPU capabilities. Therefore, having our plans contingent on this appearing is probably a poor idea. ChaPoly is there in recent FF releases, but always behind AES128-GCM. I've searched other related tickets in their bugzilla and bumped/voted the most-relevant ticket, but nothing. Given their processes, I don't think an AES-NI-sensitive NSS will make the FF 52 ESR release, which will be the final release supporting WinXP. They tried to do a quick hack earlier on to re-order ChaPoly statically at compile time just on ARM targets, but ended up reverting even that for some bug-reason.

General Cipher Perf

Really, AES-256 isn't that much awfully slower than AES-128. More to the point, in the case of older or cheaper CPUs that don't support AES-NI, our best perf option is ChaPoly anyways, not the bitness of AES. For modern higher-end hardware the AES128/256 distinction is pretty much moot.

An interesting link is https://calomel.org/aesni_ssl_performance.html , which compares the 8K-block-size performance of AES128, AES256, and ChaPoly on a variety of CPUs, using AES-NI where available.

The worst-performing CPU tested (an ARM Cortex A9) gets ~3x faster with ChaPoly than AES-128-GCM. Its ChaPoly speed is a little over 600Mbps, which is plenty fast enough for real-world client connections.

Another interesting datapoint is the Intel Core2 Q9300. This is a 2.5Ghz desktop processor that was released ~8.5 years ago and lacks AES-NI. It can do the 128 and 256 AES-GCMs at ~1Gbps (difference is minimal), and ChaPoly at ~1.8Gbps.

Up at the higher end of the results, we have the AES-NI-capable Intel i7-6700 released barely a year ago. Its AES speeds are in the ~20Gbps range, and its ChaPoly speed is a little under 5Gbps. At those kinds of speeds, the differential is irrelevant. It's all too-fast-to-matter.

Of the AES-NI capable processors tested, the worst and slowest ones (at doing ChaPoly) are the Intel i7-2635QM and the Intel Xeon L5630, clocking in at 1.2Gbps and 1.8Gbps respectively, which is still plenty-fast. AES is considerably faster on both, but the speeds are so high it hardly matters.

Even if you look at this from an efficiency standpoint on the AES-NI platforms where ChaPoly is less-efficient, the symmetric crypto costs should be a small fraction of the power wasted on things like displays, html rendering engines, image rendering/scaling, javascript execution, network transfer, TLS key exchanges, etc. In latency terms many of the same factors would dominate as well.

Given the other benefits of ChaPoly (lack of NIST origin, non-batchable 256-bit crypto, designed for bulletproof implementation without timing leaks, etc), and that at least some clients may implement ChaPoly and get deployed on older/non-AES hardware without detecing it for client-preference purposes, perhaps we should simply prefer ChaPoly over AES-GCM outright, and then leave AES256-GCM ahead of AES128-GCM beneath it over the questionable batch-attack stuff. It also has the benefit of removing a local hack to our OpenSSL build (the one that's allowing client preference for ChaPoly).

Also relevant to the above is Mozilla's current recommendations at https://wiki.mozilla.org/Security/Server_Side_TLS .

In a nutshell:

  • Their Modern-only config's high-end ciphers are ordered AES256-GCM, ChaPoly, AES128-GCM, and the rationale given is AES256-GCM is prioritized above its 128 bits variant, and ChaCha20 because we assume that most modern devices support AESNI instructions and thus benefit from fast and constant time AES.
  • Their Intermediate (which is closer in spirit to what we're looking for) and Old-Compat configs both order them as ChaPoly, AES128-GCM, AES256-GCM, giving the rationale: ChaCha20 is prefered as the fastest and safest in-software cipher, followed but AES128. Unlike the modern configuration, we do not assume clients support AESNI and thus do not prioritize AES256 above 128 and ChaCha20. There has been discussions (1, 2) on whether AES256 extra security was worth its computing cost in software (without AESNI), and the results are far from obvious. At the moment, AES128 is preferred, because it provides good security, is really fast, and seems to be more resistant to timing attacks.

However, I'll note here that I disagree with them about 128 being more-resistant than 256 to timing attacks (which is only really relevant when the client has relatively-bad software and no AES accel): when looking at this before, the only hard reference I could find on the relative timing attack resistance was this paper: https://eprint.iacr.org/2007/318.pdf , which indicates 256 is more resistant to timing attacks than 128, counter-intuitively.

I think all of this further reinforces the idea that we should aim towards a static server side preference (in our strongest ciphers) of: ChaPoly, AES256-GCM, AES128-GCM.

Change 316889 had a related patch set uploaded (by BBlack):
ssl_ciphersuite: re-order AES vs ECDSA priorities

https://gerrit.wikimedia.org/r/316889

Change 316890 had a related patch set uploaded (by BBlack):
ssl_ciphersuite: commentary update re: chapoly

https://gerrit.wikimedia.org/r/316890

Change 316891 had a related patch set uploaded (by BBlack):
ssl_ciphersuite: switch AES bits order for GCM

https://gerrit.wikimedia.org/r/316891

Change 316889 merged by BBlack:
ssl_ciphersuite: re-order AES vs ECDSA priorities

https://gerrit.wikimedia.org/r/316889

Change 319532 had a related patch set uploaded (by Muehlenhoff):
Fix read_ahead handling, drop chapoly preference patch

https://gerrit.wikimedia.org/r/319532

Change 319532 merged by Muehlenhoff:
Fix read_ahead handling, drop chapoly preference patch

https://gerrit.wikimedia.org/r/319532

Mentioned in SAL (#wikimedia-operations) [2016-11-03T13:35:59Z] <bblack> cp1065: upgrade libssl1.1 to 1.1.0b-1+wmf2 - T144626 - T148917

Mentioned in SAL (#wikimedia-operations) [2016-11-03T13:42:16Z] <bblack> cp*: upgrade libssl1.1 to 1.1.0b-1+wmf2 (but no nginx restart yet) - T144626 - T148917

Mentioned in SAL (#wikimedia-operations) [2016-11-03T13:49:25Z] <bblack> cache_maps + cache_misc: nginx lossless restarts for libssl update - T144626 - T148917

Mentioned in SAL (#wikimedia-operations) [2016-11-03T13:56:14Z] <bblack> cache_upload: nginx lossless restarts for libssl update - T144626 - T148917

Mentioned in SAL (#wikimedia-operations) [2016-11-03T14:08:14Z] <bblack> cache_text: nginx lossless restarts for libssl update - T144626 - T148917

Change 316890 merged by BBlack:
ssl_ciphersuite: commentary update re: chapoly

https://gerrit.wikimedia.org/r/316890

Update:

We're now preferring chapoly to other symmetric algorithms outright in our strongest cipher suites at the top of the list, without preference hacks that try to pay attention to signals about client support for AES acceleration.

The movements on the cipher graphs at https://grafana.wikimedia.org/dashboard/db/tls-ciphers roughly matched expectations, and there doesn't appear to be any user breakage or other negative fallout.

The point-in-time shift between our two most-popular symmetrics among the strong ciphers was roughly:

% of all ciphersfs+aes128gcmfs+chapoly
Before70%20%
After40%50%

We'll see after a few days how the longer-term averages look. Remaining TODO in here is deciding whether to flip aes256gcm/aes128gcm ordering within the strong suite as well.

Had a chance to dig through our NavTiming performance metrics. There's some slight hints of improvement here and there, but probably just wishful thinking while staring at the tea leaves. Certainly no obvious negative trends. I suspect symmetric crypto perf is such a relatively-small fraction of navtiming that a small change can't really bump things around much in the big picture.

The more I dig and study the relevant information that's out there, and especially in light of the invisible impact of the chapoly change on our navtiming, I think I'm convinced about the AES128/256 re-order in the strong suite.

Change 316891 merged by BBlack:
ssl_ciphersuite: switch AES bits order for GCM

https://gerrit.wikimedia.org/r/316891

The AES flip doesn't seem to have had any notable influence on performance metrics, either. Now after all the above merges, our server-side cipher preference list starts out with:

ECDHE-ECDSA-CHACHA20-POLY1305
ECDHE-RSA-CHACHA20-POLY1305
ECDHE-ECDSA-AES256-GCM-SHA384
ECDHE-RSA-AES256-GCM-SHA384
ECDHE-ECDSA-AES128-GCM-SHA256
ECDHE-RSA-AES128-GCM-SHA256
... [lesser options] ...

We still don't even have a full day of data under this scheme, but very roughly speaking the stats now work out as (for only the top ciphers above, and for not quite a full day of data, right near the week/weekend boundary!):
ChaPoly: 48%
AES256-GCM: 34%
AES128-GCM: 8%

Another notable effect was that there's apparently a small subset of clients out there in the world (somewhere in the 0.1% -> 1% ballpark) who implement both ECDHE-ECDSA-AES128-GCM-SHA256 and ECDHE-RSA-AES256-GCM-SHA384, but not ECDHE-ECDSA-AES256-GCM-SHA384. Therefore, our overall ECDSA-vs-RSA stats have dropped off very slightly back in the RSA direction, and can no longer be considered an accurate reflection of overall ECDSA support (as some browsers that are ECDSA-capable in general are now chosing RSA with us). I'm not too worried about digging into this.

BBlack claimed this task.