The two Zookeeper clusters, main-eqiad (conf100[4-6]) and main-codfw (conf200[1-3]) are holding data for the following clusters:
- Kafka main eqiad/codfw
- Kafka Analytics eqiad
- Kafka Jumbo eqiad
- Kafka Logging
- Kafka Burrow codfw/eqiad
- Hadoop Test Analytics
- Hadoop Analytics
The current znode allocation is the following:
- main-eqiad -> ~50k
- main-codfw -> ~3.7k
In order to preserve mental sanity when dealing with maintenance or (hopefully few) critical events, I think that it would be great to:
- check what data is surely garbage that can be trashed (reducing the amount of znodes)
- check if zookeeper is misused somehow, like an application storing data rather than a state. A huge amount of znodes under the same parent is, for example, fine but I am wondering if we could hit some limits sooner or later (or maybe subtle failures).