[This is aggregated up from conversations w/ @Krenair + @faidon at the 2018 hackathon]
This is in support of a few different needs for better-automated management and deployment of LetsEncrypt certs, especially to multiple public listener hosts for the same services. Slightly-edited notes from @Krenair :
- Central LE-service hosts (1 per core DC, manual failover) that are responsible for managing the certs: talking to Lets Encrypt, authenticating cert reqs, distribution of certs/privkeys to consuming service endpoint hosts, automatic renewal in time, etc. These will know what servers should have access to which private keys.
- For key distribution, these would run some HTTPS server that checks client CA certificates are valid, signed by puppetmaster CA, etc. and that the requesting host is authorized for the requested private key.
- For challenge authentication, we can support one or both of:
- HTTP Challenge: consuming service endpoints would proxy /.well-known/acme-challenge to the central LE host so it can answer challenges directly.
- DNS Challenge: we'll write a plugin for gdnsd (and possibly some supporting scripts, depending on design), which will allow the central LE service to push challenge responses to our authdns servers. The most-basic design would be the plugin implementing dynamic TXT records with data pulled from a local file (which it watches on mtime / inotify), and a script which polls for new challenge-responses from the central LE server (or something more-complex that triggers pushes). We should try to design the pieces of this for generic re-use/integration.
- The HTTP service is easier to initially implement, but requires the HTTP challenge-routing hacks at all endpoints and can't do wildcards. The DNS variant doesn't need the routing hacks and can do wildcards, but has more implementation work to do.
- The central LE service is probably a daemon written in Python, which we'll open-source and try to make generic enough to be useful to other organizations.. The client-authenticating HTTPS service that distributes keys/certs to endpoints should support two APIs:
- A generic standard file-fetching API, e.g. GET https://foo/certs/asdfxyz/{public|private}.pem .
- An emulation of the puppet fileserver protocol, so that it's easy to puppetize these pulls like normal "file" resources in puppet, with a distinct server hostname.
The main configuration of the python daemon probably looks something like this:
<certlabel>: CN: <name1> SNI: [ <name1>, <name2>, ... ] AuthorizedClients : [ <hostname1>, <hostname2>, ... ]
Example config for some known cases:
icinga: CN: icinga.wikimedia.org SNI: - icinga.wikimedia.org AuthorizedClients: - einsteinium.wikimedia.org - tegmen.wikimedia.org secredir: CN: www.wikipedia.com SNI: - border-wikipedia.de - en-wp.com - en-wp.org - indiawikipedia.com - mediawiki.com - wikipedia.com AuthorizedClients: - secredir1001.eqiad.wmnet - secredir2001.codfw.wmnet wikibase: CN: www.wikiba.se SNI: - *.wikiba.se - wikiba.se AuthorizedClients: - cp5001.eqsin.wmnet - cp5002.eqsin.wmnet - cp5003.eqsin.wmnet - cp5004.eqsin.wmnet - cp5005.eqsin.wmnet - cp5006.eqsin.wmnet - cp4027.ulsfo.wmnet [...]
In puppet terms, there would be a class/resource which defines such a certificate:
<somewhere that gets applied to tegmen and einsteinium> letsencrypt::cert { 'icinga': CN => 'icinga.wikimedia.org', SNI => [ 'icinga.wikimedia.org' ], }
The definition of letsencrypt::cert would entail creating file resources which pull from the le-service's puppet fileserver emulation to source private keys and signed public certs. Separately, some sort of letsencrypt::server class would collect the list of hosts which have applied each of the defined certs, in order to generate the central configuration file above with a correct list of authorized clients.