Page MenuHomePhabricator

Phase out cergen for ServiceOps services
Open, Needs TriagePublic

Description

cergen is our legacy tooling to manage/generate TLS certificates (https://wikitech.wikimedia.org/wiki/Cergen). It has been replaced by an installation of cfssl (https://wikitech.wikimedia.org/wiki/PKI) and the majority of services uses it.

Our cergen installation is co-hosted on one of the Puppet master (5) frontends (puppetmaster1001), which runs Buster. cergen is based on legacy libraries (it uses networkx v1, which is incompatible with current networkx releases (networkx 2 was released in 2017) and even when the puppetmasters were moved to Buster, this needed a hack to build a co-installable legacy package in a compomnent (T235405).

Instead of forward-porting it yet again to the new installation we'll use the Puppet 5 -> Puppet 7 migration to also phase out cergen and only use cfssl.

Most of those certs are used by Envoy and our Puppet integration makes switching relatively straightforward by switching the profile::tlsproxy::envoy::ssl_provider Hiera flag to "cfssl" (along with specifying SNI names via profile::tlsproxy::envoy::cfssl_options/hosts)

Some examples for this can be found at
https://github.com/wikimedia/operations-puppet/commit/66fbddeac3a4b2dfa1d8e19a49cc649dcb745f18
https://github.com/wikimedia/operations-puppet/commit/a00d0441b4509e736d8abd6ff63f25224e306239

For use cases outside of Envoy the profile::pki::get_cert define provides a convenient method to request certificates. An example how the gradual migration was implemented for the Ganeti RAPI endpoint can be found at https://github.com/wikimedia/operations-puppet/commit/98350d2dff51bb9bf57263fe50f409374892ae1d

Some existing certs will be obsoleted when the migration of mediawiki to k8s is completed and T352245 has a pre-existing task already.

In addition there are 4 certificate YAML specs defined in /srv/private/modules/secret/secrets/certificates/certificate.manifests.d which need to be moved to PKI/cfssl. Some services are likely also ported already and only the YAML spec file and the legacy certs were forgotten and fixing it might be a simple as removing the legacy cert material.

  • chartmuseum.certs.yaml
  • docker_registry.certs.yaml
  • _etcd-server-ssl._tcp.v3.certs.yaml T352245
  • etcd-v3.certs.yaml T352245
  • etcd-v3-eqiad.certs.yaml T352245
  • mediawiki.certs.yaml (will be obsoleted when all legacy deployments are moved to wikikube)
  • mwmaint.certs.yaml (used by noc.w.o which is already on wikikube, should be just a cleanup)
  • parsoid.certs.yaml (will be obsoleted when all legacy deployments are moved to wikikube)
  • restbase.certs.yaml
  • testreduce.certs.yaml
  • maps/karthoterian T360778

Event Timeline

I was planning to migrate etcd to PKI as part of T350565, but can explore this earlier if needed.

Change #1018199 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):

[operations/puppet@production] Switch testreduce to cfssl

https://gerrit.wikimedia.org/r/1018199

I 'll finish parsoid and testreduce in T359387

If I'm not mistaken testreduce is still unrelated, it's for the round trip tests that have been split off to a separate Ganeti VM some time ago (and was moved to Bookworm due to nodejs requirements last year)?

If so, https://gerrit.wikimedia.org/r/c/operations/puppet/+/1018199 should fix it

I 'll finish parsoid and testreduce in T359387

If I'm not mistaken testreduce is still unrelated, it's for the round trip tests that have been split off to a separate Ganeti VM some time ago (and was moved to Bookworm due to nodejs requirements last year)?

AFAIK yes.

If so, https://gerrit.wikimedia.org/r/c/operations/puppet/+/1018199 should fix it

Oh, I had missed that, thanks!

Change #1018228 had a related patch set uploaded (by Clément Goubert; author: Clément Goubert):

[operations/puppet@production] chartmuseum: Migrate to cfssl

https://gerrit.wikimedia.org/r/1018228

Change #1018228 merged by Clément Goubert:

[operations/puppet@production] chartmuseum: Migrate to cfssl

https://gerrit.wikimedia.org/r/1018228

Change #1018237 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):

[operations/puppet@production] Remove now obsolete cert

https://gerrit.wikimedia.org/r/1018237

Change #1018238 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):

[labs/private@master] Remove obsolete dummy cert

https://gerrit.wikimedia.org/r/1018238

Change #1018238 merged by Muehlenhoff:

[labs/private@master] Remove obsolete dummy cert

https://gerrit.wikimedia.org/r/1018238

Change #1018237 merged by Muehlenhoff:

[operations/puppet@production] Remove now obsolete cert

https://gerrit.wikimedia.org/r/1018237

Change #1018251 had a related patch set uploaded (by Clément Goubert; author: Clément Goubert):

[operations/puppet@production] docker_registry_ha: Migrate to cfssl

https://gerrit.wikimedia.org/r/1018251

Mentioned in SAL (#wikimedia-operations) [2024-04-10T10:16:11Z] <claime> Disabling puppet on O:docker_registry_ha::registry - T360636

Change #1018251 merged by Clément Goubert:

[operations/puppet@production] docker_registry_ha: Migrate to cfssl

https://gerrit.wikimedia.org/r/1018251

Mentioned in SAL (#wikimedia-operations) [2024-04-10T10:18:40Z] <claime> Enabling and running puppet on registry1003.eqiad.wmnet - T360636

Mentioned in SAL (#wikimedia-operations) [2024-04-10T10:21:12Z] <claime> Enabling and running puppet on O:docker_registry_ha::registry - T360636

Clement_Goubert subscribed.

chartmuseum and docker-registry done

Change #1018199 merged by Muehlenhoff:

[operations/puppet@production] Switch testreduce to cfssl

https://gerrit.wikimedia.org/r/1018199

Change #1018671 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):

[operations/puppet@production] Remove certs for docker-registry and testreduce

https://gerrit.wikimedia.org/r/1018671

Change #1018671 merged by Muehlenhoff:

[operations/puppet@production] Remove certs for docker-registry and testreduce

https://gerrit.wikimedia.org/r/1018671

Change #1018678 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):

[labs/private@master] Remove obsolete dummy certs for docker-registry and testreduce

https://gerrit.wikimedia.org/r/1018678

Change #1018678 merged by Muehlenhoff:

[labs/private@master] Remove obsolete dummy certs for docker-registry and testreduce

https://gerrit.wikimedia.org/r/1018678

Change #1019290 had a related patch set uploaded (by Hnowlan; author: Hnowlan):

[operations/puppet@production] restbase: migrate to using cfssl

https://gerrit.wikimedia.org/r/1019290

Change #1019290 merged by Hnowlan:

[operations/puppet@production] restbase: migrate to using cfssl

https://gerrit.wikimedia.org/r/1019290

Change #1020258 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):

[operations/puppet@production] Remove obsolete restbase discovery cert

https://gerrit.wikimedia.org/r/1020258

Change #1020258 merged by Muehlenhoff:

[operations/puppet@production] Remove obsolete restbase discovery cert

https://gerrit.wikimedia.org/r/1020258

Change #1020624 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):

[labs/private@master] Remove obsolete stub cert

https://gerrit.wikimedia.org/r/1020624

Change #1020624 merged by Muehlenhoff:

[labs/private@master] Remove obsolete stub cert

https://gerrit.wikimedia.org/r/1020624