We've added (or are in the process of adding) Prometheus metrics and alerting rules for all of the services operated by the Performance Team:
It'd be nice to have a "service health" dashboard for the services we run, aggregating the important values from these. This should be linked from the Icinga alerts.