Added a group for scoring platform team: https://grafana-admin.wikimedia.org/alerting/notification/5/edit
Then added an alert when average of five minutes is above 10%, https://grafana.wikimedia.org/dashboard/db/ores-extension?orgId=1
It should be enough for now but I'm not sure if the whole grafana alert system works or not. @Krinkle does it work? The "Send Test" button gives error to me "SMTP not set".
@Ladsgroup Grafana does not have outgoing E-mail configured. Instead of maintaining a separate list of contact groups and protocols for Grafana, it was decided to re-use the existing Icinga infrastructure for this.
See https://wikitech.wikimedia.org/wiki/Grafana.wikimedia.org#Alerts_(with_notifications_via_Icinga) for more information.
Short story: Alerts can be fully configured and maintained within Grafana. The only thing needed elsewhere is a one line configuration change (in Puppet) to enable Icinga alerts for a particular dashboard. Only the dashboard name and Icinga contact group name need to be specified. The rest remains dynamic and within Grafana only (including the individual alert names and their underlying queries etc.)
Everything seems fine now, I wish we could build similar screaming system for beta cluster as well but all metrics are dead there: https://grafana-labs.wikimedia.org/dashboard/db/ores-extension?orgId=1 https://grafana-labs.wikimedia.org/dashboard/db/ores-beta-cluster?orgId=1&from=now-7d&to=now
Will look into this later on.