As indicated in the parent task, whenever ChangeProp is restarted or some workers die and get respawned, there's a significant number of rebalances happen while the workers start which apparently can mess up broker state and end up in a situation when no consumer within the consumer group gets an assigned partition.
In order to prevent that a new group.initial.rebalance.delay.ms property defaulting to 3 seconds was added to kafka configuration starting with version 0.11 (KIP)
I thinnk that increasing this value to soemthing like 10 seconds could help with initial rebalancing and some quite some load.
Unfortunately the main kafka cluster is still on 0.9, so this one is blocked until we upgrade it.