Hi folks!
In the parent task we worked on a strategy to move Kafka brokers from Puppet-based TLS certs to PKI-based certs (new intermediate Kafka CA created for the use case). In T300130 kafka logging was moved to the new CA successfully, and it would be great to do the same to Kafka main as well.
I am going to list what is the rollout plan that I have used for Kafka logging:
Find Kafka clients and upgrade their trusted CA settings
The first step is to find Kafka clients and upgrade their settings to trust both the Puppet CA and the Root PKI CA certificates. Thanks to John we have a bundle in various formats on all hosts created by the wmf-certificates package and puppet:
elukey@kafka-logging1001:~$ file /etc/ssl/certs/wmf-ca-certificates.crt /etc/ssl/certs/wmf-ca-certificates.crt: PEM certificate # This one needs a hiera setting: # profile::base::certificates::include_bundle_jks: true elukey@kafka-logging1001:~$ file /etc/ssl/localcerts/wmf-java-cacerts /etc/ssl/localcerts/wmf-java-cacerts: Java KeyStore
Update Kafka settings on all brokers to allow both PKI and Puppet TLS certs at the same time
The only thing needed is the following:
profile::kafka::broker:use_pki_migration_settings: true profile::base::certificates::include_bundle_jks: true
The settings will update Kafka's super.users setting (basically the TLS CN that are trusted between brokers) and /etc/ssl/localcerts/wmf-java-cacerts will be deployed by puppet (JKS truststore with the PKI and Puppet CA certificates).
Roll restart of all brokers is needed.
Update TLS settings one broker at the time
A hiera host-specific setting should suffice:
profile::kafka::broker::ssl_generate_certificates: true
The above will request a new PKI TLS certificate, deploy it to the node via puppet and update the Kafka settings.
A restart of the affected Kafka broker is needed.
Clean up
Remove the hiera setting used to allow both PKI and Puppet CA certs:
- profile::kafka::broker:use_pki_migration_settings: true
Roll restart of all brokers is needed.
Then finally clean up old TLS certificates from puppet private (revoking them too).
What is it going to change after the move to PKI ?
The only annoyance is that every 6 months we'll need to run the kafka roll restart cookbook to pick up the new TLS certificates, since the PKI ones last 6 months for the moment. This is due to the current version of Kafka that doesn't allow hot reload of keystores: https://wikitech.wikimedia.org/wiki/Kafka/Administration#Renew_TLS_certificate