Page MenuHomePhabricator

Allow kafka brokers to reload the TLS keystore
Closed, ResolvedPublic

Description

TLS certificates issued by the Kafka intermediate CA expire after 4 weeks (by default), and in the future this figure may change. Manual actions will be painful for SREs, and kafka seems to have a way to reload its keystores:

elukey@kafka-test1006:~$ kafka-configs --bootstrap-server $(hostname -f):9092 --entity-name 1006 --entity-type brokers --add-config listener.name.SSL.ssl.keystore.location=/etc/kafka/ssl/kafka_test-eqiad_broker.keystore.p12 --alter
Completed updating config for broker: 1006.

(Please note: I am using port 9092 since with 9093 extra ssl properties are needed, otherwise the CLI issues a out of memory exception).

More info from a similar environment: https://forge.softwareheritage.org/D5864

The idea is to either:

  1. Create an exec in puppet, triggered when the new keystore is created by puppet (by the code that interfaces with the PKI intermediate)
  2. Add a Reload override to the Kafka systemd unit in puppet to execute the command or a script like the above

Event Timeline

Tried to reload the keystore on a couple of test brokers since the first warnings for tls cert expiry came up in icinga, but it doesn't seem to work. On the server.log I see stuff like:

[2022-01-23 11:34:24,569] INFO Processing override for entityPath: brokers/1008 with config: Map(listener.name.SSL.ssl.keystore.type -> PKCS12, listener.name.SSL.ssl.keystore.location -> /etc/kafka/ssl/kafka_test-eqiad_broker.keystore.p12) (kafka.server.DynamicConfigManager)
[..list-of-kafka-settings..]

But when I check with openssl the TLS certificate offered on port 9093 by the broker is not updated. If I restart the broker then it gets updated, so something is not working as expected.

I found https://issues.apache.org/jira/browse/KAFKA-7429 that seems to indicate that either the keystore password or the path needs to change to force a reload, I tried as well but nothing really changed.

Change 756522 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/puppet@production] Add a kafka_11 profile to the PKI Kafka Intermediate settings

https://gerrit.wikimedia.org/r/756522

Change 756522 merged by Elukey:

[operations/puppet@production] Add a kafka_11 profile to the PKI Kafka Intermediate settings

https://gerrit.wikimedia.org/r/756522

Change 756548 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/puppet@production] nagios: update settings for ssl_kafka

https://gerrit.wikimedia.org/r/756548

Change 756548 merged by Elukey:

[operations/puppet@production] nagios: update settings for ssl_kafka

https://gerrit.wikimedia.org/r/756548

elukey claimed this task.

It seems that our kafka version, 1.1, doesn't support well this use case. The kafka intermediate PKI CA now issues cert with 1y of validity, to reduce the burden on SREs (due to broker roll restarts etc..). We'll revisit this use case in the future after/if we'll migrate to kafka 2.1+