These alarms would help with two use cases that are almost opposite:
- Topics that have very low flow of events, maybe not even one an hour and would trigger unnecessary alarms
- Topics that are seeing a constant flow of events and for a significant interval they see none indicating an outage
Kafka jumbo-eqiad itself has all topics we'd want to ingest and monitor. We can implement a function gets this list of all Kakfa topics, maps them to stream names, and then queries each eventgate-wikimedia instance for if it is allowed to produce that stream. If it is, then consume the latest message from kafka for that stream. If the timestamp is not too old (newer than 90 days) then that stream should be both ingested and monitored.
If the stream should be monitored, get the stream's event schema's examples and POST them as a canary event to that eventgate-wikimiedia instance.