Page MenuHomePhabricator

Investigate Kafka setting
Closed, ResolvedPublic


As indicated in the parent task, whenever ChangeProp is restarted or some workers die and get respawned, there's a significant number of rebalances happen while the workers start which apparently can mess up broker state and end up in a situation when no consumer within the consumer group gets an assigned partition.

In order to prevent that a new property defaulting to 3 seconds was added to kafka configuration starting with version 0.11 (KIP)

I thinnk that increasing this value to soemthing like 10 seconds could help with initial rebalancing and some quite some load.

Unfortunately the main kafka cluster is still on 0.9, so this one is blocked until we upgrade it.


Related Gerrit Patches:

Event Timeline

Pchelolo triaged this task as Medium priority.Mar 13 2018, 8:10 PM
Pchelolo created this task.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMar 13 2018, 8:10 PM
Ottomata moved this task from Incoming to Radar on the Analytics board.Mar 15 2018, 4:35 PM

@Ottomata @elukey now that we were successful in upgrading Kafka, I think we can try increasing this to 10 seconds. Do you think the number is reasonable?

Yeah, I think that sounds fine.

Yeah, I think that sounds fine.


Change 432615 had a related patch set uploaded (by Ppchelko; owner: Ppchelko):
[operations/puppet@production] Kafka: increase to 10s.

Pchelolo moved this task from blocked to doing on the Services board.May 11 2018, 7:22 PM
Pchelolo edited projects, added Services (doing); removed Services (blocked).

Change 432615 merged by Elukey:
[operations/puppet@production] Kafka: increase to 10s.

Pchelolo closed this task as Resolved.May 15 2018, 11:36 PM
Pchelolo edited projects, added Services (done); removed Services (doing), Patch-For-Review.

This was deployed to production, the number of rebalance log messages during the consumer startups declined, so I'm resolving the ticket.