Page MenuHomePhabricator

Convert wikidata-alerts grafana dashboard to AlertManager
Closed, ResolvedPublic8 Estimated Story Points

Description

Prompted by T281359: Onboard teams with Grafana alerts to AlertManager

Dashboard: https://grafana.wikimedia.org/d/TUJ0V-0Zk/wikidata-alerts?orgId=1&refresh=5s
Docs: https://wikitech.wikimedia.org/wiki/Alertmanager

For anything either not covered or missing in the docs we are more than happy to assist with the work

using grafana for alerts is still supported, but alerts should be directed to AM, or alternatively alerts as prometheus rules deployed to alerts.git

Acceptance criteria 🏕️🌟

  • Grafana directs alerts for wikidata-alerts dashboard to alertmanager

Event Timeline

Addshore set the point value for this task to 8.Aug 25 2021, 10:20 AM

Change 715505 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[operations/puppet@production] alertmanager: Add Wikidata team to alert manager

https://gerrit.wikimedia.org/r/715505

Change 715505 merged by Filippo Giunchedi:

[operations/puppet@production] alertmanager: Add Wikidata team to alert manager

https://gerrit.wikimedia.org/r/715505

Change 715543 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[operations/puppet@production] icinga: Drop grafana alerts

https://gerrit.wikimedia.org/r/715543

Change 715543 merged by Filippo Giunchedi:

[operations/puppet@production] icinga: Drop grafana alerts

https://gerrit.wikimedia.org/r/715543

The grafana part is done, there are some other wikidata alerts that need migrating to AM which I will do separately on the parent ticket.

Looks like the grafana ones are all done.
@Ladsgroup also created T290080: Move wikidata lag checks off Icinga as there are some puppet metrics that we could move over.
While discussing on mattermost it seems that these are only for dispatch lag, which is already sent to graphite.
SO we can probably just ensure we have matching alerts for these, and nuke the duplicated puppet alerts?