Page MenuHomePhabricator

Setup mailing list for automated monitoring reports from Beta Cluster project
Closed, ResolvedPublicFeature

Description

As a co-maintainer of Beta Cluster
I want to get notifications of Puppet run problems and other monitored failures from the instances of the project
So I can make time to look into errors before someone finds me on irc and asks for help with a failed service.

Sending Puppet failure emails to everyone in the project was disabled in 2021. It is understandable in this highly shared environment that not everyone with admin level rights is actively able to help triage automated alerts. If we setup an opt-in mailing list at lists.wikimedia.org and direct Prometheus alert emails there we can make it easier for those who are interested in helping more proactively get involved without feeding a larger group messages that feel spammy to some.

Event Timeline

We actually already have https://lists.wikimedia.org/postorius/lists/betacluster-alerts.lists.wikimedia.org/ which was setup for T1125: Send beta cluster Jenkins alerts to betacluster-alert list. Adding Prometheus alerts to that existing list is maybe a better idea than starting a new list.

I think we may be rediscovering the work from T789: Send beta cluster icinga alerts to a list from first principles.

@greg made me an admin of betacluster-alerts@lists.wikimedia.org