Page MenuHomePhabricator

Upgrade kafka-jumbo to kafka 3.7
Closed, ResolvedPublic

Description

  • Pin the inter broker protocol version on the brokers to hieradata/role/common/kafka/jumbo/broker.yaml:profile::kafka::broker::inter_broker_protocol_version: 1.1.0
  • Perform a rolling upgrade of the brokers, that will restart with the pinned version configurations and the new kafka version, using host-by-host patches and service restart of kafka broker, e.g. https://gerrit.wikimedia.org/r/c/operations/puppet/+/1273863
    • kafka-jumbo1010
    • kafka-jumbo1011
    • kafka-jumbo1012
    • kafka-jumbo1013
    • kafka-jumbo1014
    • kafka-jumbo1015
    • kafka-jumbo1017
    • kafka-jumbo1018
    • kafka-jumbo1016 (controller)
  • Change the inter broker protocol version to match the new kafka version

    Set hieradata/role/common/kafka/logging.yaml:profile::kafka::broker::inter_broker_protocol_version: 3.7
  • Perform a final rolling restart of the brokers

Related Objects

StatusSubtypeAssignedTask
OpenNone
Resolvedbrouberol

Event Timeline

brouberol triaged this task as Medium priority.

Change #1277558 had a related patch set uploaded (by Brouberol; author: Brouberol):

[operations/puppet@production] kafka-jumbo: update kafka-jumbo1010 confluent distro to 77

https://gerrit.wikimedia.org/r/1277558

Change #1277555 had a related patch set uploaded (by Brouberol; author: Brouberol):

[operations/puppet@production] kafka-jumbo: update kafka-jumbo1010 confluent distro to 77

https://gerrit.wikimedia.org/r/1277555

Change #1277558 abandoned by Brouberol:

[operations/puppet@production] kafka-jumbo: update kafka-jumbo1010 confluent distro to 77

Reason:

Created by mistake

https://gerrit.wikimedia.org/r/1277558

Change #1277555 merged by Brouberol:

[operations/puppet@production] kafka-jumbo: update kafka-jumbo1010 confluent distro to 77

https://gerrit.wikimedia.org/r/1277555

kafka-jumbo1010 is now running kafka 3.7, without hurdles!

brouberol@kafka-jumbo1010:~$ dpkg -l | grep confluent-kafka | grep ii
ii  confluent-kafka                      7.7.7-1                            all          publish-subscribe messaging rethought as a distributed commit log

image.png (2,922×2,168 px, 404 KB)

Change #1278355 had a related patch set uploaded (by Brouberol; author: Brouberol):

[operations/puppet@production] kafka-jumbo: deploy kafka 3.7 to all brokers

https://gerrit.wikimedia.org/r/1278355

Change #1278355 merged by Brouberol:

[operations/puppet@production] kafka-jumbo: deploy kafka 3.7 to all brokers

https://gerrit.wikimedia.org/r/1278355

brouberol updated the task description. (Show Details)

Leaving the cluster in a semi-upgraded state caused an incident: https://docs.google.com/document/d/17BuSP9-tlfHN0MRTifP5EFLk1_gS5NaHKi0eo4Jw7JM/edit?tab=t.0#heading=h.jev75r1ysugd

We had to speed up the upgrade process, so all brokers are running 3.7 now!

Change #1280078 had a related patch set uploaded (by Brouberol; author: Brouberol):

[operations/puppet@production] kafka-jumbo: set inter.broker.protocol to 3.7.0

https://gerrit.wikimedia.org/r/1280078

Change #1280078 merged by Brouberol:

[operations/puppet@production] kafka-jumbo: set inter.broker.protocol to 3.7

https://gerrit.wikimedia.org/r/1280078

We've had to resort to rolling restart the cluster manually, due to a permission error preventing the cookbook from triggering an election (probably due to the jumbo ACLs).

----- OUTPUT for command #1: 'source /etc/prof...topic-partitions' -----
kafka-leader-election --bootstrap-server kafka-jumbo1010.eqiad.wmnet:9092,kafka-jumbo1011.eqiad.wmnet:9092,kafka-jumbo1012.eqiad.wmnet:9092,kafka-jumbo1013.eqiad.wmnet:9092,kafka-jumbo1014.eqiad.wmnet:9092,kafka-jumbo1015.eqiad.wmnet:9092,kafka-jumbo1016.eqiad.wmnet:9092,kafka-jumbo1017.eqiad.wmnet:9092,kafka-jumbo1018.eqiad.wmnet:9092 --election-type PREFERRED --all-topic-partitions
Not authorized to perform leader election
Not authorized to perform leader election
org.apache.kafka.server.common.AdminCommandFailedException: Not authorized to perform leader election
        at org.apache.kafka.tools.LeaderElectionCommand.electLeaders(LeaderElectionCommand.java:134)
        at org.apache.kafka.tools.LeaderElectionCommand.run(LeaderElectionCommand.java:117)
        at org.apache.kafka.tools.LeaderElectionCommand.mainNoExit(LeaderElectionCommand.java:71)
        at org.apache.kafka.tools.LeaderElectionCommand.main(LeaderElectionCommand.java:66)

That being said, the cluster now has inter.broker.protocol: 3.7 enabled everywhere.

I'm calling this done!