Page MenuHomePhabricator

Upgrade Kafka to from 1.x to later version
Open, MediumPublic

Description

Hi folks,

opening this task to discuss if it is worth or not to start testing a migration path to a newer Kafka version. As Andrew pointed out several times, the 2.x versions don't have a lot of new cool features on the broker side, since most of the work has been done on things like Kafka streams etc.. There are some good things though that I tracked over time:

From the upgrade guide the rolling upgrade seems doable/possible, maybe we could try it on the test cluster to see how it goes during the coming months.

Event Timeline

A better and more stable Kafka Mirror Maker (even if after all the work that Andrew did we have something very stable as well now)

This really does look great, and has some nice features, like automatic DC topic prefixing! It may make it simpler for producers to produce to the correct topics without being DC aware. However, we'd have to change a bunch of existing stuff to get that to work, so that's probably a totally separate project.

Anyway, +1!

jbond triaged this task as Medium priority.Feb 16 2022, 4:56 PM

Since the task has been opened, a new Kafka major was released :D

Last one seems https://www.confluent.io/blog/apache-kafka-3-4-0-new-features-and-updates/, that provides an initial support to migrate from Zookeeper to Raft in a rolling restart fashion (although the workflow is not production-ready yet).

We wanted to try out latest Kafka and KRaft with [[ T307944 | Kafka stretch ]], if we ever have time to get around to it. We have the hardware, and it would be a brand new cluster, so easier to experiment with.

Ottomata renamed this task from Upgrade Kafka to 2.x to Upgrade Kafka to 2.x or 3.x.Apr 3 2023, 12:36 PM
Ottomata added a project: Event-Platform.
Ottomata added subscribers: gmodena, lbowmaker.

Change 935071 had a related patch set uploaded (by Btullis; author: Btullis):

[operations/puppet@production] Add an apt mirror for the confluent-kafka 7.4 release

https://gerrit.wikimedia.org/r/935071

Change 935071 merged by Btullis:

[operations/puppet@production] Add an apt mirror for the confluent-kafka 7.4 release

https://gerrit.wikimedia.org/r/935071

Ottomata updated the task description. (Show Details)

A few questions:

  • While we ought to consider an upgrade for all 4 clusters, from what I understand Jumbo can be upgraded independently. Are there any concerns with that approach?
  • What are the upgrade considerations for Kafka clients?
  • Specifically are there clients that publish to Kafka Jumbo directly or do all Kafka topics get mirrored from main (possibly logging?)?

While we ought to consider an upgrade for all 4 clusters, from what I understand Jumbo can be upgraded independently. Are there any concerns with that approach?

@odimitrijevic I'd probably start with kafka-test, but apart from this, I think kafka-jumbo can indeed be independently be upgraded.

What are the upgrade considerations for Kafka clients?

When it comes to the client, my understanding is that there's a protocol negotiation happening as part of the client connection to Kafka. This means that an older client can always talk to a newer version of Kafka, they will just settle on the latest version of the protocol understood by the client. That being said, we should also consider upgrading the client libraries when possible, which would then enable us to rely on features such as https://cwiki.apache.org/confluence/display/KAFKA/KIP-302+-+Enable+Kafka+clients+to+use+all+DNS+resolved+IP+addresses
However, I believe these can be tackled independently from each other.

Specifically are there clients that publish to Kafka Jumbo directly or do all Kafka topics get mirrored from main (possibly logging?)?

I'm not versed enough in our infrastructure to be able to answer this one.

Specifically are there clients that publish to Kafka Jumbo directly or do all Kafka topics get mirrored from main (possibly logging?)?

I'm not versed enough in our infrastructure to be able to answer this one.

Adding my 2c:

  • The most important client for Jumbo is Varnishkafka, I think that we could upgrade Kafka test first and set up a single-node varnishkafka instance (getting help from Traffic) just to double check that nothing breaks (with the current version of librdkafka etc.., basically the Kafka client code deployed on the Caching/CDN hosts).
  • Jumbo mirrors topics from Kafka Main using something called "Mirror Maker", so we could use Kafka Test again to verify that topics can be copied from Jumbo.
  • We use Benthos on centrallog nodes to read Webrequest topics, sample only a fractions of their events and re-publish a new topic called "webrequests_sampled" back to Jumbo. Druid Analytics then ingests the "webrequest_sampled" traffic to be able to offer the data to Superset and Turnilo. The SRE team uses these dashboards to inspect traffic patterns during DDoS etc.., so it is a use case to keep in mind when upgrading. This pipeline should be able to use the new Kafka version buuut double checking before upgrading is a good idea :)
  • Eventgate publishes directly to Jumbo for some use cases, but we have upgraded it recently so it should work out of the box.
  • The Network SREs use a tool called pmacct to get data from Network equipment (netflow) and publish it to Jumbo, we should follow up with them to test it with recent versions of Kafka (again the Kafka test cluster could be a valid target for a dummy instance).

Nothing big but the above requires some coordination with other teams for sure :)

Ottomata renamed this task from Upgrade Kafka to 2.x or 3.x to Upgrade Kafka to from 1.x to later version.Dec 7 2023, 8:15 PM
Ottomata updated the task description. (Show Details)