Page MenuHomePhabricator

Drop AES-256 mid/compat lists.
Closed, DeclinedPublic

Description

The change is simple, but I figured a phab discussion might be warranted, or might make a good reference link later.

Basic Info

AES256 options comprise roughly half of our server-side ciphersuite list in: https://github.com/wikimedia/operations-puppet/blob/production/modules/wmflib/lib/puppet/parser/functions/ssl_ciphersuite.rb .

For all other dimensions (e.g. Kx, Auth, Mac), we always offer an AES128 equivalent for every AES256 option, and we always prefer the AES128 variant. Therefore AES128 is always selected by clients unless they've explicitly disabled all AES128 options, which is not the default for any known client implementation - it's something the user/administrator would have to force/customize. Stats for the past week show that AES256-forcing clients comprise approximately 0.0013% of all requests. I've actually watched realtime logs to try to pin down exactly what's generating these requests. I've only ever caught two distinct users of AES256, and their traffic rates seem to comprise the overwhelming majority of all AES256 hits we get:

  1. afnoc.af.mil outbound proxies - These are all IPs of the form 138.3.x.x, with hostnames of the form SITE-pxyw[0-9]e.afnoc.af.mil, where SITE is one of several known names of US Air Force bases. This traffic only seems to happen on our upload (as opposed to text or mobile) clusters, and always choses a cipher of AES256-SHA256. I can't find any real whois information that would be specific enough to reach out to whoever runs these, I don't see a public whois server for subdomains of .mil. This traffic is on the order of ~0.5 reqs/sec in global daily averages. The traffic from here is fairly benign and looks like normal, expected patterns for an outbound proxy for general usage, other than the fact that it's only hitting the upload cluster and not text. Could be a case of re-using upload.wm.o commons images on an internal .mil wiki that causes these hits. They may have filtered out AES-128 choices due to some blanket application of some military "256-bit only because it's bigger" sort of rule that comes out of some NIST/DOD standard. They'll be affected by the change.
  2. Some guy in France - 90.54.216.xx / xx.abo.wanadoo.fr - Seems to be using IE11 with all 128-bit options manually disabled, choosing ECDHE-ECDSA-AES256-GCM-SHA384 with us. This guy (and others like him I think) is most of the rest of AES256 traffic, at a somewhat lower average rate than afnoc.af.mil, and more-bursty. He wouldn't be affected by the change, as he's using an PFS+AEAD+AES256 choice. There are probably others like this guy on different days and time-windows, but average rates indicate there are very few of them (enough that they don't overlap much).

Why do we care?

Cutting out a large chunk of our active cipher list simplifies it for analysis and further updates since we're preferring AES-128 on the balance of perf/security anyways and no clients should be selecting these options by default.

Also, generally speaking the choice of AES256 over AES128 is probably more misguided than informed in most cases. This Crypto.StackExchange discussion covers a lot of the debate, especially if you read more than just the top response, and the various links within the responses. The net of it in my mind is this: AES-128 is more than secure enough into the foreseeable future in brute force terms, and is better-studied in cryptanalytic terms and has held up fairly well so far. AES-256 has some suspicions around its key schedule in general, and there is a known related-key attack already which, counter-intuitively, makes AES-256 slightly weaker than AES-128 (that specific attack may not apply to common TLS use-cases, though!).

It seems silly that someone would go through the trouble of disabling AES-128 and not go through the trouble of upgrading the software to support PFS+AEAD, which is far, far, more important than the AES key length regardless of your stance on the 128-vs-256 debate. Therefore, the proposal here is that we drop all AES256 options which are in the mid and compat lists, but leave it as a less-preferred option in the strong list of PFS+AEAD options for clients that feel AES256 is stronger, and are being smart about other ciphersuite options, and want to make that manual choice.

Event Timeline

BBlack created this task.Jul 13 2015, 5:55 PM
BBlack raised the priority of this task from to Needs Triage.
BBlack updated the task description. (Show Details)
BBlack added projects: acl*sre-team, HTTPS, Traffic.
BBlack added a subscriber: BBlack.
Restricted Application added subscribers: Matanya, Aklapper. · View Herald TranscriptJul 13 2015, 5:55 PM

Change 224445 had a related patch set uploaded (by BBlack):
Drop AES256 from mid/compat lists

https://gerrit.wikimedia.org/r/224445

BBlack triaged this task as Normal priority.Jul 13 2015, 6:43 PM
BBlack moved this task from Triage to Up Next on the Traffic board.Jul 13 2015, 7:03 PM
BBlack updated the task description. (Show Details)Jul 13 2015, 8:38 PM
BBlack set Security to None.

To go a bit further on what's questionable about this: It's questionable whether we should even be trying to do enforcement against bad choices like this. Ignoring anything about simplifying our own work: the positive view is "the very very few people who misconfigure in ways that get impacted by this change may wonder why wikipedia doesn't work anymore, and come look at this, and do something better". The negative view is "We still support lots of less-secure options than this, such as 3DES for IE8/XP - why should we impact anyone else at all? let them make a questionable choice and ignore it."

My thought is that we'd better support a cipher suite as long as someone is actively using it and it is not close to broken (such as RC4). So how about keeping AES256-SHA256 and cutting out other AES256 ciphers in mid and compat lists? Also, why not remove dhe-rsa-camellia256-sha too? It was not negotiated for 3 weeks.[1]

[1] https://tessera.wikimedia.org/dashboards/6/tls?from=-3w

re: Camellia: we've only got stats for 1 week so far, but yeah, I'm not fond of keeping either of the Camellia options in the long run. They don't appear to be in enough use that they're defaults anywhere, and there is no AEAD version, so that means whoever's picking these manually is making a bad decision (whatever your view on AES vs Camellia as algorithms, AEAD is more important). I just haven't done the legwork to really vet who's using them and why yet, and at least the remaining ones in the list are PFS.

As far as AES256 goes, I tend to think if we're going to disable AES256 variants, we have to start from the bottom up. Otherwise people who have filtered their client cipher choices to pick only AES256-based suites may get forced into a lower category than necessary (e.g. down from PFS to non-PFS). So, if anything, I'd rather just kill one or more of the compat AES256 options first and leave mid alone until compat is clear of them. The only compat AES256 option with significant traffic is the AES256-SHA256 choice, but AFAICS that particular suite is only coming from that one afnoc.af.mil source and nowhere else. I imagine they can figure out how to deal with it.

In any case, this can all wait a little while to gather more stats and data and do more thinking, it's not urgent. We're in a lull week or two here in general: lots of ops out for Wikimania (so avoiding risk/change where reasonable), and I'm leaving for a week's vacation on Friday as well.

fgiunchedi added a subscriber: fgiunchedi.

Change 224445 abandoned by BBlack:
Drop AES256 from mid/compat lists

Reason:
let's revisit this at a much later date, and maybe do it differently, too...

https://gerrit.wikimedia.org/r/224445

BBlack closed this task as Declined.Jul 27 2015, 3:57 PM

On further reflection, now's not really the time to disable AES256 like this. If anything we could start with just the ones that are non-PFS, but:

  1. there are many other priorities ahead of this for improving our cipher negotiations.
  2. the AES256 stats may evolve further as we continue to switch more long-tail client accesses (e.g. POST traffic, parsoid/restbase-related things, etc, etc), and as we disable other far less secure choices down the road
  3. by the time we get down to looking at this again, it's likely that the whole picture will have evolved, as TLS often does...
BBlack moved this task from Up Next to Done on the Traffic board.Jul 27 2015, 3:57 PM