Page MenuHomePhabricator

Provide an easy way of picking the traffic serving TLS certificate used by ATS
Open, Stalled, NormalPublic

Description

Right now with nginx we can pick the certificate being used to serve traffic by simply setting public_tls_unified_cert_vendor to the name of the certificate, (digicert-2019 or globalsign-2018) while with ATS we currently need to set a whole structure with several paths and filenames, making it more prone to errors.

Event Timeline

Restricted Application added a project: Operations. · View Herald TranscriptOct 7 2019, 9:33 AM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Vgutierrez triaged this task as Normal priority.Oct 7 2019, 9:47 AM

Change 541220 had a related patch set uploaded (by Vgutierrez; owner: Vgutierrez):
[operations/puppet@production] ATS: Pick the unified cert using hiera key public_tls_unified_cert_vendor

https://gerrit.wikimedia.org/r/541220

ema moved this task from Triage to TLS on the Traffic board.Oct 14 2019, 5:36 PM

Notes from IRC, etc:

The current patch (merging shortly: https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/541220/ ) gets us part of the way there, enough to deploy dual commercial certs again across ATS + nginx. What we need to do beyond that in the near to medium term is:

  • Reduce some of the duplication in the hieradata (the common SNI-check blocks, etc)
  • Cover deployment of the lets-encrypt / acme_chief case to all nodes based on it being in the defined set of cert options (currently it's only deployed in eqsin; turning it on elsewhere via ucv hieradata wouldn't work)
  • Fix the issues around ATS not handling a switch of directory paths with a simple reload, since commercial and LE cert options current have distinct directory paths. This means we'll have to create some universal paths for ATS's access to the unified cert/key/ocsp which are symlinks out to the appropriate commercial or legacy paths as appropriate.

Mentioned in SAL (#wikimedia-operations) [2019-10-17T14:32:23Z] <bblack> disable puppet on cache fleet (cp*) ahead of cert deployment refactoring - T234803

Change 541220 merged by BBlack:
[operations/puppet@production] ATS: Pick the unified cert using hiera key public_tls_unified_cert_vendor

https://gerrit.wikimedia.org/r/541220

Change 544151 had a related patch set uploaded (by Vgutierrez; owner: Vgutierrez):
[operations/puppet@production] ATS: Use a common base path for /etc/ssl and /etc/acmecerts certs

https://gerrit.wikimedia.org/r/544151

Change 544151 merged by Vgutierrez:
[operations/puppet@production] ATS: Use a common base path for /etc/ssl and /etc/acmecerts certs

https://gerrit.wikimedia.org/r/544151

Change 545204 had a related patch set uploaded (by Vgutierrez; owner: Vgutierrez):
[operations/puppet@production] acme_chief: Grant access to all cp nodes to the unified cert

https://gerrit.wikimedia.org/r/545204

Change 545204 merged by Vgutierrez:
[operations/puppet@production] acme_chief: Grant access to all cp nodes to the unified cert

https://gerrit.wikimedia.org/r/545204

Change 545206 had a related patch set uploaded (by Vgutierrez; owner: Vgutierrez):
[operations/puppet@production] ATS: Reload TLS material on acme_chief::cert updates

https://gerrit.wikimedia.org/r/545206

Change 545208 had a related patch set uploaded (by Vgutierrez; owner: Vgutierrez):
[operations/puppet@production] ATS: Deploy acme-chief version of the unified certificate globally

https://gerrit.wikimedia.org/r/545208

Change 545206 merged by Vgutierrez:
[operations/puppet@production] ATS,tlsproxy: Reload TLS material on acme_chief::cert updates

https://gerrit.wikimedia.org/r/545206

Change 545208 merged by Vgutierrez:
[operations/puppet@production] ATS: Deploy acme-chief version of the unified certificate globally

https://gerrit.wikimedia.org/r/545208

Notes from IRC, etc:
The current patch (merging shortly: https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/541220/ ) gets us part of the way there, enough to deploy dual commercial certs again across ATS + nginx. What we need to do beyond that in the near to medium term is:

  • Reduce some of the duplication in the hieradata (the common SNI-check blocks, etc)
  • Cover deployment of the lets-encrypt / acme_chief case to all nodes based on it being in the defined set of cert options (currently it's only deployed in eqsin; turning it on elsewhere via ucv hieradata wouldn't work)
  • Fix the issues around ATS not handling a switch of directory paths with a simple reload, since commercial and LE cert options current have distinct directory paths. This means we'll have to create some universal paths for ATS's access to the unified cert/key/ocsp which are symlinks out to the appropriate commercial or legacy paths as appropriate.

After merging https://gerrit.wikimedia.org/r/545206 and https://gerrit.wikimedia.org/r/545208 the following things have changed since @BBlack comment:

  • acme-chief version of the unified certificate is now deployed everywhere
  • setting the ucv to lets-encrypt would work for the upload cluster.

Change 545693 had a related patch set uploaded (by Vgutierrez; owner: Vgutierrez):
[operations/puppet@production] acme_chief: Grant new esams cp hosts access to the unified certificate

https://gerrit.wikimedia.org/r/545693

Change 545693 merged by Vgutierrez:
[operations/puppet@production] acme_chief: Grant new esams cp hosts access to the unified certificate

https://gerrit.wikimedia.org/r/545693

Vgutierrez changed the task status from Open to Stalled.Mon, Nov 4, 3:56 AM

I'm marking this task as stalled, it will be resolved as soon as T231627 is completed