In T341558 the topicmappr command was used to selectively move kafka topic partitions from one broker to the other ones, on both main clusters. We got positive results, but topicmappr also offers a rebalance command (as opposed to rebuild that we used) to:
- Fetch metrics from Prometheus about free storage and kafka log partitions size.
- Optimize leadership distribution moving as few partitions around as possible.
Two projects are needed:
- https://github.com/DataDog/kafka-kit/wiki/Topicmappr-Usage-Guide
- https://github.com/tarvip/kafkakit-prometheus-metricsfetcher (this one is needed since the other one only support Datadog's own format).
I'd like to test rebalance on kafka-main codfw (and possibly eqiad afterwards) to see if they can be useful tools for more targeted use cases.