Create a secure redirect service for large count of non-canonical / junk domains
Open, NormalPublic

Description

Given recent progress on production puppetization of LetsEncrypt.org (LE), and LE itself improving in general in recent months (moved from beta to production status, has proved itself a bit, ratelimits are reasonable, etc), I think we can now really contemplate the idea of doing a secure redirector service to cover large counts of junk domains. We talked this out a bit on IRC, and AFAICS there's now no real technical blockers to making this happen; we'll probably be able to handle hundreds of one-off domainnames for this through LE mechanisms.

One noteable tradeoff is it will have to be an SNI-dependent service for the bulk of the names. That means many of these secure redirects will not work for certain older browsers (notably IE[78]-on-XP, Android 2.x, and some very old feature phones like Symbian and Blackberry). Given the alternative is to dead-park (no browser functionality or at least no true redirect) the bulk of these domains, the SNI limitation is probably acceptable, and we can certainly arrange the certificate sets such that the highest-value ones are on the default SNI server for greater compatibility than the rest.

What it basically boils down to now is:

  • Decide on a reasonable SAN list length limit per cert: 100
  • Prioritize which "junk" domains should be in the primary (works for non-SNI) SAN list
  • Puppetize a service role built around modules/nginx + modules/letsencrypt that can redirect a configured large set of domainnames securely.
  • Assign a new public IP for this in eqiad + codfw LVS ranges.
  • Deploy this service in eqiad + codfw (possibly on virtual hosts as the load should be fairly light). Probably manual gdnsd inter-DC failover at least initially until we sort out x-dc LE-cert issues.

Related Objects

BBlack created this task.Apr 25 2016, 3:22 PM
Restricted Application added a project: Operations. · View Herald TranscriptApr 25 2016, 3:22 PM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
jayvdb added a subscriber: jayvdb.Apr 26 2016, 4:09 AM

According to https://letsencrypt.org/upcoming-features/, they don't yet have IDN support and their ETA is "By August 1, 2016". So if we get this going sooner than that, we may have to skip the (several) IDN names in various lists for the first iteration and come back to them later.

Also, on the SAN list length limits, LE has this to say: https://community.letsencrypt.org/t/sans-per-cert-and-sni-for-hosting-service/5105

Basically they're setting the LE limit to 100/cert for a couple of reasons: over-caution on browser compatibility, and to set some reasonable boundaries on their service's execution (e.g. how many challenges have to be issued in a single API call and whether that reaches some maximum timeout for the whole operation, etc). Another interesting datapoint is that our largest commercial cert in use here at the WMF has 38 SANs in a single cert and is known to work well (doesn't get into TCP window issues territory).

We'll probably structure our data for this service into two SAN lists: a primary list (works for non-SNI) that's capped at 100 names, and a secondary list (only works with SNI) which is auto-split into 100-name chunks when generating certs. There's no point trying to actually come up with an explicit priority ordering and pick out the top 100 right off the bat to fill the first list. We're probably better off being conservative and only identifying the most-important redirects to place into that list, hopefully leaving room for future expansion without hitting the limit. As the lists grow they'll be serially appended to, as any kind of alphabetization or sorting by TLD within each list would cause a lot of pointless churn with certificate regenerations.

We also need to decide on a data model, and especially about what kinds of hostnames we're going to support for the redirect domains. We can't do wildcards with LE, so any domains that require wildcarding to support language-subdomains are at least initially out of scope for this. I would imagine for the rest, we can standardize on just the root of the domain and www., which means each domain will actually occupy 2x SAN slots. I'm not sure if that's a sane assumption for all cases or not, tbh. If a lot of domains don't need www., we could make it optional and save some room in SANs. Of course, the data will also need a redirect target for each domain.

The deployment process will be two-step every time we add a new chunk of domains to the list(s):

  1. Add them in DNS, pointed at the redirect service's IP (probably done by some template system to make it easy)
  2. Configure them in the actual redirector service and run puppet to regenerate the applicable cert(s) (which needs to execute a challenge that depends on correct DNS above).

There's probably a larger process to work out ahead of the technical steps as well, that involves legal and tech approval of domains, verification that they're actually mapped to our authdns servers by registrars, etc. If we ever drop our registrations of existing names, we'll need to support that process, too. If deletions are common (I don't think they are), we'd probably have to have some data structure support for a deleted flag, so that it doesn't churn the list right away, but gets removed on the next necessary renewal.

BBlack updated the task description. (Show Details)Apr 27 2016, 11:00 AM
BBlack updated the task description. (Show Details)
fgiunchedi triaged this task as Normal priority.Apr 27 2016, 3:06 PM

"Secure redirect service" is grammatically unclear to me, I don't understand what is verb/noun/adjective. Does the summary just mean "Switch to HTTPS all non-canonical / junk redirecting domains"?

BBlack added a comment.EditedMay 22 2016, 12:08 PM

It means a distinct service, running separately from our normal infrastructure for the canonical domains, which does nothing but handle the hundreds of non-canonical domains: using/renewing LetsEncrypt certs for them and doing local HTTP->HTTPS upgrade redirects followed by an HTTPS->HTTPS redirect to one of our canonical domains. Separating this avoids a lot of unnecessary complexity in the canonical domains' front edge software.

Change 292785 had a related patch set uploaded (by BBlack):
redirects.dat - split non-canonical to separate section

https://gerrit.wikimedia.org/r/292785

Change 295249 had a related patch set uploaded (by BBlack):
ncredir hostname and service IP

https://gerrit.wikimedia.org/r/295249

AlexMonk-WMF renamed this task from Secure redirect service for large count of non-canonical / junk domains to Create a secure redirect service for large count of non-canonical / junk domains.Sep 5 2016, 2:11 AM

According to https://letsencrypt.org/upcoming-features/, they don't yet have IDN support and their ETA is "By August 1, 2016". So if we get this going sooner than that, we may have to skip the (several) IDN names in various lists for the first iteration and come back to them later.

It's now saying "before November 30, 2016"...

"Secure redirect service" is grammatically unclear to me, I don't understand what is verb/noun/adjective. Does the summary just mean "Switch to HTTPS all non-canonical / junk redirecting domains"?

Fixed, I think.

BBlack moved this task from Triage to TLS on the Traffic board.Sep 30 2016, 1:49 PM
Huji removed a subscriber: Huji.Oct 3 2016, 6:03 PM

Change 317450 had a related patch set uploaded (by Alex Monk):
POC: Secure redirect service

https://gerrit.wikimedia.org/r/317450

ema added a subscriber: ema.Oct 19 2017, 4:09 PM