Page MenuHomePhabricator

Phase out cergen for Data Platform services
Closed, ResolvedPublic

Description

cergen is our legacy tooling to manage/generate TLS certificates (https://wikitech.wikimedia.org/wiki/Cergen). It has been replaced by an installation of cfssl (https://wikitech.wikimedia.org/wiki/PKI) and the majority of services uses it.

Our cergen installation is co-hosted on one of the Puppet master (5) frontends (puppetmaster1001), which runs Buster. cergen is based on legacy libraries (it uses networkx v1, which is incompatible with current networkx releases (networkx 2 was released in 2017) and even when the puppetmasters were moved to Buster, this needed a hack to build a co-installable legacy package in a compomnent (T235405).

Instead of forward-porting it yet again to the new installation we'll use the Puppet 5 -> Puppet 7 migration to also phase out cergen and only use cfssl.

Most of those certs are used by Envoy and our Puppet integration makes switching relatively straightforward by switching the profile::tlsproxy::envoy::ssl_provider Hiera flag to "cfssl" (along with specifying SNI names via profile::tlsproxy::envoy::cfssl_options/hosts)

Some examples for this can be found at
https://github.com/wikimedia/operations-puppet/commit/66fbddeac3a4b2dfa1d8e19a49cc649dcb745f18
https://github.com/wikimedia/operations-puppet/commit/a00d0441b4509e736d8abd6ff63f25224e306239

For use cases outside of Envoy the profile::pki::get_cert define provides a convenient method torequest certificates. An example how the gradual migration was implemented for the Ganeti RAPI endpoint can be found at https://github.com/wikimedia/operations-puppet/commit/98350d2dff51bb9bf57263fe50f409374892ae1d

There are currently three cert groups defined in /srv/private/modules/secret/secrets/certificates/certificate.manifests.d which need to be moved to PKI/cfssl. Some services are likely also ported already and only the YAML spec file and the legacy certs were forgotten and fixing it might be a simple as removing the legacy cert material.

  • analytics_http_ui.certs.yaml
  • kafka_test.certs.yaml (this one is likely obsolete, all Kafka hosts should use PKI by now)
  • schema.certs.yaml

The certificate for yarn.wikimedia.org has also been used with a number of SANs to support:

  • hue.wikimedia.org
  • piwik.wikimedia.org
  • turnilo.wikimedia.org
  • stats.wikimedia.org
  • analytics.wikimedia.org

These will need to be checked individually, in order to ensure a smooth migration.

Event Timeline

Gehel triaged this task as Medium priority.Mar 20 2024, 9:01 AM
Gehel moved this task from Incoming to Toil / Automation on the Data-Platform-SRE board.

Change #1013955 had a related patch set uploaded (by Btullis; author: Btullis):

[operations/puppet@production] Update the ssl_provider for the YARN ui to cfssl

https://gerrit.wikimedia.org/r/1013955

Change #1013956 had a related patch set uploaded (by Btullis; author: Btullis):

[operations/puppet@production] Update the ssl_provider for the 2cwanalytics webserver to cfssl

https://gerrit.wikimedia.org/r/1013956

Change #1013957 had a related patch set uploaded (by Btullis; author: Btullis):

[operations/puppet@production] Update the ssl_provider for matomo to cfssl

https://gerrit.wikimedia.org/r/1013957

Change #1013958 had a related patch set uploaded (by Btullis; author: Btullis):

[operations/puppet@production] Update the ssl_provider for turnilo to cfssl

https://gerrit.wikimedia.org/r/1013958

Change #1013959 had a related patch set uploaded (by Btullis; author: Btullis):

[operations/puppet@production] Update the ssl_provider for hue to cfssl

https://gerrit.wikimedia.org/r/1013959

Change #1013977 had a related patch set uploaded (by Btullis; author: Btullis):

[operations/puppet@production] Update the ssl_provider for the eventschema service to cfssl

https://gerrit.wikimedia.org/r/1013977

I have created separate CRs for each of the services that used these cergen certificates and they're all ready for review.
I will start by deploying the least critical, which is likely hue.

Once this has been shown to work, then we should be OK to proceed to turnilo, yarn, stats, matomo, then finally schema.

Change #1013959 merged by Btullis:

[operations/puppet@production] Update the ssl_provider for hue to cfssl

https://gerrit.wikimedia.org/r/1013959

Change #1013956 merged by Btullis:

[operations/puppet@production] Update the ssl_provider for the analytics webserver to cfssl

https://gerrit.wikimedia.org/r/1013956

Change #1013958 merged by Btullis:

[operations/puppet@production] Update the ssl_provider for turnilo to cfssl

https://gerrit.wikimedia.org/r/1013958

Change #1013955 merged by Btullis:

[operations/puppet@production] Update the ssl_provider for the YARN ui to cfssl

https://gerrit.wikimedia.org/r/1013955

Change #1013957 merged by Btullis:

[operations/puppet@production] Update the ssl_provider for matomo to cfssl

https://gerrit.wikimedia.org/r/1013957

Change #1013977 merged by Btullis:

[operations/puppet@production] Update the ssl_provider for the eventschema service to cfssl

https://gerrit.wikimedia.org/r/1013977

Mentioned in SAL (#wikimedia-analytics) [2024-03-25T15:02:56Z] <btullis> updating the ssl_provider for eventstreams schema servers to cfssl for T360412

I have now removed the obsolete cergen material for all of these services, since all of these services are running on discovery certificates mainained by the PKI system.

Change #1016312 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):

[labs/private@master] Remove obsolete stub secret

https://gerrit.wikimedia.org/r/1016312

Change #1016313 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):

[operations/puppet@production] Remove now obsolete certificate

https://gerrit.wikimedia.org/r/1016313

Change #1016315 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):

[operations/puppet@production] schema: Remove obsolete certificate

https://gerrit.wikimedia.org/r/1016315

Change #1016316 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):

[labs/private@master] schema: Remove dummy cert

https://gerrit.wikimedia.org/r/1016316

Change #1016316 merged by Muehlenhoff:

[labs/private@master] schema: Remove dummy cert

https://gerrit.wikimedia.org/r/1016316

Change #1016312 merged by Muehlenhoff:

[labs/private@master] Remove obsolete stub secret

https://gerrit.wikimedia.org/r/1016312

Change #1016315 merged by Muehlenhoff:

[operations/puppet@production] schema: Remove obsolete certificate

https://gerrit.wikimedia.org/r/1016315

Change #1016313 merged by Muehlenhoff:

[operations/puppet@production] Remove now obsolete certificate

https://gerrit.wikimedia.org/r/1016313