Would like to have one dashboard where we can view
- [exists?] Total event traffic
- [exists?] Traffic to each EventGate instance
- [exists] Traffic to Kafka
- [exists] Top Kafka topics
- [exists] Bottom (or dead) Kafka topics (for pruning or retiring streams/instrumentation)
- [can't find] Total traffic to logstash
- [can compute?] Contribution from Event traffic to logstash overlayed
- [probably exists] Traffic to statsD / Prometheus
- [can compute?] Contribution from Event traffic overlayed
- [needs instrumentation] Timing
- For an event to go from the client to EventGate
- For an event to go from the client to Kafka
- For an event to go from the client to HDFS
- For an event to go from the client to logstash, etc.
- [exists?] Errors
- [exists?] Rejection fraction (validation failure)
- [exists?] EventGate timeouts or non-200 statuses
- [exists?] Kafka errors
- [exists?] Insertion errors (into e.g. HDFS, logstash, etc.)
Nice dimensions to have would be:
- [exists] Stream/Topic
- [needs instrumentation] Platform
- Desktop Web
- Mobile web
- iOS App
- Android App
- MediaWiki Server
- Other (KaiOS, labs, etc.)
- [needs instrumentation] Project (e.g. which wiki)
Most of this is pulling together existing numbers, except for the timing data. Each event has a timestamp on it already, so it should not be hard, all that is required is a datetime difference with this field taken at the point of measurement.
The project dimension could be computed using the hostname field or the HTTP header. The platform dim is harder and we may not be able to do it.