As we're building the new JobQueue based on Event-Platform we have an opportunity to rethink the monitoring of the queue.
By using ChangeProp we get some nice perks out of the box that we can monitor:
- Rate of posting to the queue by job type
- Rate of processing of each job type
- Delay in time between the root job was posted and the leaf job executed
- Backlog in number of messages per job type
- Rates of retries
- Deduplication rates
- Change prop memory usage stats
- Redis usage stats
- Kafka-Changeprop RTT
- Detailed breakdown of the root jobs being processed right now - which templates were edited, when, etc
I think we can do better and create some more interesting metrics or scripts to get even more insight into what's going on. I've created this brainstorming task to collect all the ideas of what people want to be able to see in regards to monitoring and debugging the new queue. Please add your most wild ideas here and then we can discuss whether it's possible to implement and if it is - how do we do it.