Some of the information out there about where we are and where we're going is a bit disparate and lost in the noise of many separate Tasks and impending gerrit commits. This is an attempt to bring together a coherent view of the current state of things, the next upcoming steps, etc. If you know information that isn't here (missing Task refs, etc) please feel free to correct it! The Description here will evolve as we go, use comments to discuss, etc.
- **Definitions**
- Primary Wiki Domains - These are the canonical site/domain names that are both (a) contained in the SAN wildcard list of our primary unified production certificate, and (b) served by the primary production traffic clusters (text, mobile, and upload). Neither of those lists is a superset of the other today, so we're looking at an intersection.
- The SAN wildcard list, keeping in mind that wildcards only cover one level of hostname depth, is: *.X.org + *.m.X.org, where X is any of: wikipedia, wikimedia, wiktionary, wikiquote, wikibooks, wikisource, wikinews, wikiversity, wikidata, wikivoyage, wikimediafoundation, or mediawiki, as well as the special case *.zero.wikipedia.org.
- Which DNS domainnames map to the text, mobile, and upload clusters is determined by looking in our DNS repo at which things are mapped to the geoip upload-addrs, text-addrs, or mobile-addrs endpoints. Only upload.wikimedia.org maps to the upload cluster. Most other primary wikis you would think of in the SAN domains map to text-addrs, and the .m. variants map to mobile-addrs.
- Misc-Web Cluster Domains - Everything mapped via DNS to misc-web-lb.eqiad.wikimedia.org today. Almost all of these services are within the SAN element *.wikimedia.org, with the exceptions of *.planet.wikimedia.org and *.wmfusercontent.org, both of which we have separate certificates for. These all terminate through the "misc-web" nginx/varnish infrastructure, and in that sense are very similar in technical setup to (and share some code with) the Primary Wiki Domains above. Examples include: policy.wikimedia.org, phabricator.wikimedia.org, graphite.wikimedia.org, git.wikimedia.org, etc...
- Other Domains - Random one-offs which (sometimes for good reasons!) we have not placed behind the Misc-Web cluster, so they run their own independently-managed apache or nginx instance facing the world directly. This currently includes examples like: gerrit.wikimedia.org, icinga.wikimedia.org, etc...
- Insecure Redirect Domains - This class deserves special mention: we have **many** domainnames (and sub-hostnames of otherwise-legit Primary Wiki Domains) which we own and serve legitimately from DNS and map to the text and/or mobile clusters for HTTP+HTTPS service, but which do not match our SAN list. These are mostly just universal redirects to canonical Primary Wiki Domains. This includes examples like `www.en.wikipedia.org` (one too many levels of hostname depth), `wikimedia.ee`, `wikizpravy.cz`, `wikimediacommons.jp.net`, etc. We cannot redirect these directly to HTTPS at the varnish layer and lock them on with HSTS/301/rel=canonical like we do for the Primary Wiki Domains, primarily because we don't have certificate SANs that match them. Setting that up for all of them would be prohibitive on multiple levels.
- **Current State of Affairs - Redirects and HSTS**
- Primary Wiki Domains - All are unconditionally forced as 301 redirects to HTTPS, where they are served with a 1-year HSTS header on all requests, with the following notable exceptions:
- We only 301-redirect `GET` and `HEAD` requests, not `POST`.
- We have exceptions carved out for the MediaWiki User-Agent for the commons and meta (.wikimedia.org) domains temporarily, to avoid breakage for InstantCommons and possible similar internal cases for meta. These are intended to be very temporary.
- On the positive side, wikidata.org is now also sending `includeSubdomains; preload` in its HSTS header, and has been submitted for browser pre-loading. We'd like to do that for more Primary Wiki Domains in the future, but all of our other domains have issues that prevent it currently.
- Misc-Web - Mixed: some are enforcing HTTPS w/ HSTS, some are not, currently configured by-service. Could/should eventually be forced at the Misc-Web outer layer, like the Primary Wiki Domains.
- Other - Mixed: some are enforcing HTTPS w/ HSTS, some are not, configured by-service.
- ** Current State of Affairs - Crypto Compatibility vs Security Tradeoffs **
- As a general rule, none of our HTTPS endpoints should support SSLv2 or SSLv3. I don't believe there are any exceptions to this today. The only client browser anyone cares much about which lacks TLSv1.0 (or higher) support and is blocked by this is IE6 on Windows XP (which, due to our redirects and lack of SSLv3 support, cannot access our HTTPS-redirected sites at all anymore).
- We maintain [[ https://github.com/wikimedia/operations-puppet/blob/production/modules/wmflib/lib/puppet/parser/functions/ssl_ciphersuite.rb | explicit ciphersuite lists in our puppet repo ]] that are intended to be shared by all TLS termination software for all cases. There are four choices a site/service can choose from at this time: strong, mid, compat, and compat-dhe. Descriptions straight from ssl_ciphersuite.rb:
- strong: Only TLSv1.2 with ECDHE-based AEAD ciphers. In practice this is a very short list, and requires a very modern client. No tradeoff is made for compatibility. Only known to work with: Modern FF/Chrome, IE11, Java8, Android 4.4+, OpenSSL 1.0.x. Definitely broken with: All Safari (OSX/iOS). IE11 support requires an ECDSA key as well, whereas others can work with RSA.
- mid: Supports TLSv1.0 and higher, and adds several forward-secret options which are not AEAD. This is compatible with many more clients than "strong", but still not compatible with: Android 2.x, IE8/XP, OpenSSL 0.9.8, Java6.
- compat: Supports most legacy clients, PFS optional, TLSv1.0+ only.
- compat-dhe: Upgrades 'compat' to use DHE for PFS with certain older clients. Breaks some older/commercial Java6 clients, but makes things more secure for Android 2 and OpenSSL 0.9.8. *Critical* - Requires the use of good (non-default, not weak, 2048-bit+) DH parameters elsewhere in the server config, which is not enforced by this code.
- The specific contents of the cipher lists above will evolve over time. compat-dhe in particular may eventually lose most or all of its non-PFS options, and compat may eventually become synonmous with compat-dhe.
- Primary Wiki Domains - Currently on `compat`
- Misc-Web Cluster - Currently on `compat`
- Other - Should be using whichever of the 4 options is deemed appropriate for the service (with a preference towards mid where possible at present, I think) and should be running nginx or apache-2.4 as endpoint software with OpenSSL > 1.0. I've said Should because we know there are exceptions to this that need to be addressed. Nothing Should be using custom or default ciphersuite lists outside of the puppet-defined options above.
- **Short-to-Medium Term Ongoing Work**
- Primary Wiki Domains
- Switch to `compat-dhe`: T104281 - pending gerrit commit to switch: https://gerrit.wikimedia.org/r/#/c/222023/ . Needs a little more research on the severity of the Java6 issue, and probably needs some community pre-announcement about the Java6 issue as well.
- Add ECDSA keys - T86654 - Most of the technical work is done here and ready to go. Our implementation here would offer ECDSA and RSA in parallel, so there are no compatibility concerns here that we're aware of. We haven't merged some of the puppetization yet ( https://gerrit.wikimedia.org/r/#/c/222067/ ), and even once that's in place, we're holding on a few extra days over a semi-related Security issue that's out of scope for this ticket/discussion.
- Redirect POST traffic - Held commit @ https://gerrit.wikimedia.org/r/#/c/221974/ . This probably needs a Task, more impact investigation on bots and such, and a community pre-announcement about the possible breakage.
- Get rid of the InstantCommons -related GET/HEAD redirect exceptions: I think we're still waiting on various investigations and resolutions in T102566, but we're not going to hold out indefinitely if the software is simply incapable of following HTTPS redirects...
- Misc/Other
- @Chmarkine has been maintaining [[ https://wikitech.wikimedia.org/wiki/HTTPS/domains | a table on wikitech to track progress on these ]].
- Task tracking the Misc-Web cluster services specifically: T103919 - https://wikitech.wikimedia.org/wiki/User:Dzahn/misc-web-https
- Make `compat-dhe` easier to use for Other Domains - no task yet, probably involves generalizing dhparam generation and adding the directives for it to the work in the ssl_ciphersuite shared code, and enforcing !apache2.2
- Insecure redirects - Meta-task about the easier cases contained within the Primary Wiki Domains: T102824 . Task about the broader problem of our many many domainnames and how to categorize and deal with them: T101048
- ** Followups to the above / Longer-Term **
- After the Primaries' switch to `compat-dhe`, we'll want to sample client cipher negotiations and decide which non-forward-secret options we can eliminate from our ciphersuite lists without any real impact. Again, IE8/XP will likely be the biggest sticking point. We may need to actively campaign and raise awareness a bit about getting our users to move off of IE8/XP before we can eventually, someday, eliminate compatibility with it. I think our long-term goal here is to get the Primaries onto a `compat-dhe` which contains only forward-secret ciphers, and everything else using `strong` or `mid`.
- POST traffic should eventually be rejected/broken rather than redirected. The redirects are a first step, but in the long run we don't want these (usually automated bot/code) clients having to follow these redirects, as they've usually already leaked private data during the initial HTTP request before the redirect. This will involve even more community pre-announcement about switching all URLs in software configurations to HTTPS.
- HSTS Preloading - T104244 - this is done for wikidata.org (pending review by browsers upstream), but all of the others are blocked on various DNS/redirect-level issues noted in the Insecure Redirects tasks above.
- HPKP - T92002 - We've dithered back and forth a lot on exactly how we're going to implement this, as it's tricky and scary and involves doing a better job at key management than we do today. For now, we've blocked this on first getting over the ECDSA hurdle so that we know what certs we're dealing with.