Page MenuHomePhabricator

cloud-vps solution for Let's Encrypt
Closed, ResolvedPublic

Description

Acme-chief has a ton of overhead -- setting it up in a new project requires additional VMs, and solves problems that cloud-vps mostly doesn't care about.

Let's set up a different puppet solution for automating LE certs within cloud-vps. Let's pick a LE client that's easily installable on our hosts (ideally already packaged/maintained for Debian) and write some puppet classes that can be run locally to request and install a cert.

We'll still need a per-project LE account and a designate service user.

Event Timeline

Krenair subscribed.
  • find ACMEv2 client we can puppetise the installation of
    • ensuring we can get some certs through running a local commands to get certs (for dynamicproxy to call)
    • ensuring we can get some certs configured through puppet or whatever (for other use cases e.g. tools to replace acme-chief and stuff covered under T252199)
    • need to be able to get secrets for ACME account as well as designate service account through unpuppetised secrets, seeing as no expectation of a local puppetmaster

edit: obviously, needs to be able to support openstack designate to write DNS entries - either in an integration we write ourselves or built-in

also, when we say "per-project", we mean "dynamicproxy will need one and might have permissions for all projects, tools will need one to manage its own, toolserver-legacy will need one to manage its own", etc.

figured out roughly how this can work

  • certbot and python3-designateclient packages in buster
  • run certbot as root or something with permissions to write to /var/log/letsencrypt
sync.py
#!/usr/bin/python3
import os
import time

import yaml

from designateclient.v2 import client as designateclient
from keystoneauth1.identity import v3
from keystoneauth1 import session as keystone_session

with open('designate-sync-config.yaml') as f:
    config = yaml.safe_load(f)

client = designateclient.Client(
    session=keystone_session.Session(auth=v3.Password(
        auth_url=config['OS_AUTH_URL'],
        username=config['OS_USERNAME'],
        password=config['OS_PASSWORD'],
        user_domain_name='default',
        project_domain_name='default',
        project_name=config['OS_PROJECT_NAME']
    )),
    region_name=config['OS_REGION_NAME']
)

domain = os.environ['CERTBOT_DOMAIN']
validation_str = os.environ['CERTBOT_VALIDATION']
print('Got domain {} and validation_str {}'.format(domain, validation_str))
if not domain.startswith('_acme-challenge.'):
    domain = '_acme-challenge.' + domain
if domain[-1] != '.':
    domain += '.'

print('creating record {}'.format(domain))
# Find all zones we might want to put this record in
potential_zones = []
print('listing zones, looking for {}'.format(domain))
for zone in client.zones.list():
    if domain.endswith('.' + zone['name']):
        zone['match_specificness'] = len(zone['name'].split('.'))
        potential_zones.append(zone)

print('potential zones: {}'.format(potential_zones))
# Pick the most specific zone to put it in
potential_zones.sort(key=lambda z: z['match_specificness'], reverse=True)
zone = potential_zones[0]
# This means c.b.a.wmflabs.org will go under b.a.wmflabs.org rather than
# a.wmflabs.org.

new_recordset_id = None

# Look for existing records to potentially update
for recordset in client.recordsets.list(zone['id']):
    if recordset['name'] == domain and recordset['type'] == 'TXT':
        if validation_str not in recordset['records']:
            recordset['records'].append(validation_str)
            client.recordsets.update(
                zone['id'],
                recordset['id'],
                {"records": recordset['records']}
            )
            print('Updated recordset {} in zone {}'.format(recordset['id'], zone['id']))
            new_recordset_id = recordset['id']
        break
else:
    # Create it
    ret = client.recordsets.create(zone['id'], domain, 'TXT', [validation_str])
    new_recordset_id = ret['id']
    print('Created recordset {} in zone {}'.format(ret['id'], zone['id']))

assert new_recordset_id is not None
while True:
    ret = client.recordsets.get(zone['id'], new_recordset_id)
    if ret['status'] == 'ACTIVE':
        print('New recordset is ready')
        break
    print('New recordset is not ready yet')
    time.sleep(2)
cleanup.py
#!/usr/bin/python3
import os

import yaml

from designateclient.v2 import client as designateclient
from keystoneauth1.identity import v3
from keystoneauth1 import session as keystone_session

with open('designate-sync-config.yaml') as f:
    config = yaml.safe_load(f)

client = designateclient.Client(
    session=keystone_session.Session(auth=v3.Password(
        auth_url=config['OS_AUTH_URL'],
        username=config['OS_USERNAME'],
        password=config['OS_PASSWORD'],
        user_domain_name='default',
        project_domain_name='default',
        project_name=config['OS_PROJECT_NAME']
    )),
    region_name=config['OS_REGION_NAME']
)

domain = os.environ['CERTBOT_DOMAIN']
validation_str = os.environ['CERTBOT_VALIDATION']
if not domain.startswith('_acme-challenge.'):
    domain = '_acme-challenge.' + domain
if domain[-1] != '.':
    domain += '.'

# Find all zones we might want to find this record in
potential_zones = []
for zone in client.zones.list():
    if domain.endswith('.' + zone['name']):
        zone['match_specificness'] = len(zone['name'].split('.'))
        potential_zones.append(zone)

# Pick the most specific zone to find it in
potential_zones.sort(key=lambda z: z['match_specificness'], reverse=True)
zone = potential_zones[0]
# This means c.b.a.wmflabs.org will go under b.a.wmflabs.org rather than
# a.wmflabs.org.

# Look for existing records that match
for recordset in client.recordsets.list(zone['id']):
    if recordset['name'] == domain and recordset['type'] == 'TXT':
        if validation_str in recordset['records']:
            recordset['records'] = list(set(recordset['records']) - {validation_str})
            if len(recordset['records']) > 0:
#                client.recordsets.update(
#                    zone['id'],
#                    recordset['id'],
#                    {"records": recordset['records']}
#                )
                print('CLEANUP: UPDATING {} RECORD FROM {} TO {}'.format(recordset['id'], zone['id'], recordset['records']))
            else:
#                client.recordsets.delete(zone['id'], recordset['id'])
                print('CLEANUP: DELETING {} RECORD FROM {}'.format(recordset['id'], zone['id']))
        break
designate-sync-config.yaml
OS_AUTH_URL: http://cloudcontrol1003.wikimedia.org:5000/v3
OS_USERNAME: testlabs-dns-manager
OS_PASSWORD: real password goes here
OS_PROJECT_NAME: testlabs
OS_REGION_NAME: eqiad1-r
  • certbot certonly --manual --preferred-challenges=dns -d '*.testlabs.wmflabs.org' -n --manual-auth-hook ./sync.py --manual-cleanup-hook ./cleanup.py --agree-tos --email krenair+t252721@gmail.com --manual-public-ip-logging-ok
    • (consider running with --dry-run or --test-cert to mess around with it)
# openssl x509 -in /etc/letsencrypt/live/testlabs.wmflabs.org/fullchain.pem -noout -text
Certificate:
    Data:
        Version: 3 (0x2)
        Serial Number:
            04:d4:22:cb:3c:a6:2b:4b:9b:a4:d2:f1:37:5d:f1:ff:5f:5e
        Signature Algorithm: sha256WithRSAEncryption
        Issuer: C = US, O = Let's Encrypt, CN = Let's Encrypt Authority X3
        Validity
            Not Before: May 15 19:24:25 2020 GMT
            Not After : Aug 13 19:24:25 2020 GMT
        Subject: CN = *.testlabs.wmflabs.org

sync.py and cleanup.py are mostly based on modules/acme_chief/files/designate-sync.py

WIP puppetisation of this on krenair-t252721-test.testlabs.eqiad.wmflabs, has successfully issued a cert

Andrew changed the task status from Open to Stalled.May 26 2020, 5:26 PM

After a recent meeting, we're going to put this project on hold while we give acme-chief another try. Acme-chief is clearly not the ideal solution for cloud-vps but there are few enough use-cases that it might be better to re-use this rather than add new code to support.

I warned @Vgutierrez that we'll be using acme-chief and will need to know about possible breaking changes. He seemed fine with all that.

We have this problem in the devtools project which keeps us from running a Gerrit instance:

"Account creation on ACMEv1 is disabled.
Please upgrade your ACME client to a version that supports ACMEv2 /

Please could somebody look at Paladox' patch at https://gerrit.wikimedia.org/r/c/operations/puppet/+/602722

Change 604165 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] profile::wmcs::proxy::static: allow hiera to specify an acme-chief cert

https://gerrit.wikimedia.org/r/604165

Change 604165 merged by Andrew Bogott:
[operations/puppet@production] profile::wmcs::proxy::static: allow hiera to specify an acme-chief cert

https://gerrit.wikimedia.org/r/604165

I set up acme-chief in project-proxy using Krenair's guide here:

https://wikitech.wikimedia.org/wiki/Acme-chief/Cloud_VPS_setup

It's a lot of steps, but wasn't terrible and should be adaptable to other projects. So I'm fine with us just having this be the solution until we find a situation where it's not usable.

Thanks! We'll have to try it for devtools.

One issue we will have is that we can't create another instance due to quota though.

One issue we will have is that we can't create another instance due to quota though.

You could either try installing it on an existing host (probably anything that's not your puppetmaster should work, ideally something non-public-facing), or requesting a quota increase per the usual process.