This task needs to be broken down further, but writing down the things.
- Classical observability
- Operational metrics
- Grafana dashboard, think of target(s) of evaluation that make sense
- Instrument types of notifications that are being spawned - maybe via Grafana, maybe via MEP, maybe both
- Enrollment, disenrollment, callbacks
Be aware of potential needs for sampling as this can be very high throughput.
Consider user privacy when designing schema(s), and be aware of where events may be published.
**Metrics section from the RFC:**
> We will track the [[ https://landing.google.com/sre/sre-book/chapters/monitoring-distributed-systems/#xref_monitoring_golden-signals | Four Golden Signals ]]: latency, traffic, errors, and saturation.
> Additionally, we will track product-oriented metrics both overall and per-platform, including:
> - Subscription request rate (req/s)
> - Subscription deletion request rate (req/s)
> - Total subscription count
> Metrics must be compatible with [[ https://wikitech.wikimedia.org/wiki/Prometheus | Prometheus ]]. Alerts will be configured for request spikes or when error rates pass a reasonable threshold.