Page MenuHomePhabricator

Make sure webrequest_text preferred partition leadership is balanced
Closed, ResolvedPublic5 Estimated Story Points

Description

Currently, webrequest_text's preferred replica leaders are not evenly balanced across all 6 kafka-jumbo brokers.

kafka-jumbo1006 only leads 2 partitions, whereas kafka-jumbo1002 and kafka-jumbo1005 each have 5 (instead of 4). We have 6 brokers and 24 partitions, so each broker should be preferred leader for 4 partitions.

We should reassign preferred leadership to fix.

Event Timeline

Mentioned in SAL (#wikimedia-analytics) [2018-10-29T14:27:31Z] <ottomata> ran kafka-preferred-replica-election on kafka jumbo-eqiad cluster (this successfully rebalanced webrequest_text partition leadership) T207768

Interesting! Today Luca and I were about to move partition leadership using kafka reassign-partitions, but we noticed that the replica assignment actually looked correct, and what we were going to change it to. It was only the leadership that was out of whack. So we ran a kafka preferred-replica-election to see if it would rebalance the leadership. And it did! The partition leadership now looks good.

We then wondered why it wasn't already balanced, since we have auto.leader.rebalance.enable=true. I found

leader.imbalance.per.broker.percentage
The ratio of leader imbalance allowed per broker. The controller would trigger a leader balance if it goes above this value per broker. The value is specified in percentage.
Default: 10

So, I suspect that for whatever reason, some partition leadership was unbalanced, but not enough to make it above the 10% unbalanced threshold that would cause a leadership election to happen. We may want to lower this value.

Luca and I decided to wait until the next time we have to reboot Kafka broker nodes and see what happens. If after a cluster reboot the leadership doesn't got back to being 100% balanced, we can examine the leadership spread and verify if this is the case.

Ottomata set the point value for this task to 5.
Ottomata moved this task from Next Up to Paused on the Analytics-Kanban board.

Oo, here's a plausible explanation. kafka-jumbo1006 was the only broker that was missing some of its leaders. It is usually the last nod e to be rebooted for a full cluster reboot. webrequest_text takes the longest to sync back up after a broker restart. I betcha that all but these two partitions had resynced into the ISR, and the auto leader rebalancer was triggered to run (after 300 seconds) and saw that the imbalance percentage was greater than 10%. It then triggered a leader election BEFORE these two webrequest_text replicas were back in the ISR. Most of the leaders would have then been elected appropriately, but not these partitions. Soon after these replicas would have resynced, but at this point the imbalance partition is less than 10%, so any future auto rebalancer runs wouldn't trigger an election.

We should just add a step to the reboot procedure to manually run kafka preferred-replica-election after all replicas are back in the ISR.