In the parent task we discovered that Kafka main eqiad brokers are currently suffering from traffic handling imbalance.
We should redo T288825 and rebalance partitions in both Kafka main eqiad and codfw to make things better.
Goals:
- Idle percent time among brokers should be more evenly distributed. The minimal safe/good value from upstream docs is 20%, below that it means likely performance issues.
- Better distribution of partition leaders.
Nice to have goals:
- Overall Produce time decreased under a second.