Page MenuHomePhabricator

Test and upgrade Kafka clusters to Openjdk 17
Closed, DeclinedPublic

Description

In order to make the Kafka upgrade easier, it would be really convenient if all clusters ran openjdk-17. We are currently running openjdk-8 with Kafka 1.1, but we have never really tested if the new version worked fine.

We could do the following:

  • Move kafka test to JDK 17 and observe clear issues between brokers and with simple clients. Any severe error would mark this option as no-go.
  • Upgrade a canary host on each production cluster, and observe for regressions or issues reported by clients.
  • Complete the upgrade in nothing comes up.

Running all kafka clusters on JDK 17 would allow us to also upgrade to Bookworm without much issues, without having to upgrade Kafka beforehand (Bullseye deprecation happens in August).

Related Objects

StatusSubtypeAssignedTask
OpenNone
DeclinedNone

Event Timeline

On kafka-test1006 I manually installed openjdk/jre 17 and modified JAVA_HOME in /etc/default/kafka. I found the following issues before being able to start the broker:

/usr/bin/kafka-run-class:

#JAVA_MAJOR_VERSION=$($JAVA -version 2>&1 | sed -E -n 's/.* version "([^.-]*).*"/\1/p')
#if [[ "$JAVA_MAJOR_VERSION" -ge "9" ]] ; then
KAFKA_GC_LOG_OPTS="-Xlog:gc*:file=$LOG_DIR/$GC_LOG_FILE_NAME:time,tags:filecount=10,filesize=102400"
#else
#  KAFKA_GC_LOG_OPTS="-Xloggc:$LOG_DIR/$GC_LOG_FILE_NAME -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=100M"
#fi

By default it seems that some GC options are added, that we then override via KAFKA_OPTS in /etc/default/kafka. IIUC the JAVA_MAJOR_VERSION line is not correct, and without the comments that I added above the codes ends up in the else branch for KAFKA_GC_LOG_OPTS, ending up in the JVM refusing to start for non compatible parameters.

Then I had to fix /etc/default/kafka replacing

KAFKA_OPTS="-XX:GCLogFileSize=50M -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=5 -javaagent:/usr/share/java/prometheus/jmx_prometheus_javaagent.jar=10.64.16.146:7800:/etc/prometheus/kafka_broker_prometheus_jmx_exporter.yaml -Djdk.tls.namedGroups=secp256r1 -XX:+UseAES -XX:+UseAESIntrinsics"

with

KAFKA_OPTS="-XX:+UnlockDiagnosticVMOptions -Xlog:gc*:file=/var/log/kafka/gc.log:time,uptime,level:filecount=5,filesize=50M -javaagent:/usr/share/java/prometheus/jmx_prometheus_javaagent.jar=10.64.16.146:7800:/etc/prometheus/kafka_broker_prometheus_jmx_exporter.yaml -Djdk.tls.namedGroups=secp256r1 -XX:+UseAES -XX:+UseAESIntrinsics"

After that the broker started without any issue. So ideally we could prepare a puppet patch that overrides kafka-run-class (or we do it via Debian packaging, maybe more robust) and /etc/default/kafka settings, turning those options on when JRE/JDK 17 is used.

elukey triaged this task as Medium priority.EditedFeb 6 2026, 2:10 PM

We can just use https://github.com/apache/kafka/commit/c34f3d066ead40d8c0bca0cf92d4226d2d6416c6 :)

Edit: not really, we use confluent's packages without touching them, so we'll need to use puppet for this :(

Change #1237502 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/puppet@production] confluent::kafka::common: allow using Kafka 1.1 with Openjdk 17

https://gerrit.wikimedia.org/r/1237502

Change #1237507 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/puppet@production] profile::kafka::broker: allow to force openjdk-17

https://gerrit.wikimedia.org/r/1237507

Change #1237508 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/puppet@production] role::kafka::test: force JDK7 and apply missing inter_broker_protocol_version

https://gerrit.wikimedia.org/r/1237508

Change #1237858 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/puppet@production] confluent::kafka::broker: fix default options for jvm-17+

https://gerrit.wikimedia.org/r/1237858

I did some research about running Kafka 1.1 on JDK17, and I didn't find anybody reporting that they have done it successfully (as I expected). At a high level, the major problems that we can expect will be at run time:

  • Possible issues with Kafka's code using internal JVM APIs that are more encapsulated and shielded on 17.
  • Possible variations of G1 / GC algorithm actions at runtime, affecting performances.
  • Possible issues while using snappy or other encodings due to the many changes happened between 8 and 17.

Most of the issues either appear while starting kafka or as exceptions at runtime, so I'd expect nothing horrible happening if we run this successfully in deployment-prep for some days (grepping through the logs if any weird exception is raised).

On the other hand, IIUC Kafka 3.5 can run on JDK 8 (namely it is supported, not sure how well) and we have it backported on Bookworm. So running JDK 8 could be doable during the upgrade, and migrating it to 17 should be possible on bullseye and bookworm (for example, the latter case may be relevant if a team decides to move away from bullseye before the upgrade for some reason).

Change #1237858 merged by Elukey:

[operations/puppet@production] confluent::kafka::broker: fix default options for jvm-17+

https://gerrit.wikimedia.org/r/1237858

On the other hand, IIUC Kafka 3.5 can run on JDK 8 (namely it is supported, not sure how well) and we have it backported on Bookworm. So running JDK 8 could be doable during the upgrade, and migrating it to 17 should be possible on bullseye and bookworm (for example, the latter case may be relevant if a team decides to move away from bullseye before the upgrade for some reason).

Java 8 on Bookworm will be with us for a long time due to Hadoop, so my proposal would be to

  1. move to Kafka 3.5 on Bullseye
  2. from Kafka 3.5/Java8 on Bullseye to Kafka 3.5/Java 17 on Bullseye
  3. reimage to bookworm
elukey changed the task status from Open to Stalled.Feb 10 2026, 4:09 PM

Marking this as stalled since the SIG decided to avoid this road for the moment.

Change #1237507 abandoned by Elukey:

[operations/puppet@production] profile::kafka::broker: allow to force openjdk-17

https://gerrit.wikimedia.org/r/1237507

Change #1237502 abandoned by Elukey:

[operations/puppet@production] confluent::kafka::common: allow using Kafka 1.1 with Openjdk 17

https://gerrit.wikimedia.org/r/1237502

Change #1237508 abandoned by Elukey:

[operations/puppet@production] role::kafka::test: force JDK17 and apply missing inter_broker_protocol_version

https://gerrit.wikimedia.org/r/1237508