Page MenuHomePhabricator

Automated event stream throughput alerting for important state change streams
Open, Needs TriagePublic

Description

In T329064, the mediawiki.page-undelete stream was empty for over 3 months. No one noticed.

We should have some kind of automated throughput monitoring for important streams.

Suggestion:

  • Add some info to stream config indicating the approximate expected throughput of a stream
  • Add a setting for enabling throughput alerting
  • Write a script that can be executed by AlertManager or Icinga to check throughput of all streams with throughput alerting enabled.