Page MenuHomePhabricator

Port traffic/netops grafana alerts to AlertManager
Closed, ResolvedPublic

Description

From parent task:

monitor/traffic.pp:    monitoring::grafana_alert { 'varnish-http-requests':
monitor/traffic.pp:    monitoring::grafana_alert { 'ping-offload':
monitor/traffic.pp:    monitoring::grafana_alert { 'rpki':

See also: https://wikitech.wikimedia.org/wiki/Alertmanager

Event Timeline

moving to radar for tracking :-)

fgiunchedi moved this task from Radar to Doing on the User-fgiunchedi board.
ema triaged this task as Medium priority.May 26 2021, 2:42 PM

Change 695367 had a related patch set uploaded (by Ema; author: Ema):

[operations/puppet@production] alertmanager: route Traffic team alerts

https://gerrit.wikimedia.org/r/695367

Change 695367 merged by Ema:

[operations/puppet@production] alertmanager: route Traffic team alerts

https://gerrit.wikimedia.org/r/695367

I've added a always-firing test alert on Grafana with the following tags: team: traffic, severity: critical. Shortly after I did so, we received an alert both via email and IRC, confirming that alertmanager routing works as expected.

09:34 -!- jinxer-wm [~jinxer-wm@user/jinxer-wm] has joined #wikimedia-traffic
09:34 < jinxer-wm> (EmaTestingAlertManager) firing: EmaTestingAlertManager - https://alerts.wikimedia.org

Change 696384 had a related patch set uploaded (by Ema; author: Ema):

[operations/puppet@production] icinga: remove Grafana alerts for Traffic/Netops

https://gerrit.wikimedia.org/r/696384

ema renamed this task from Port traffic/netops grafana alerts to AM to Port traffic/netops grafana alerts to AlertManager.May 27 2021, 12:37 PM

OK so it turns out that defining the alerts in Grafana is possible but not recommended, and the right thing to do is adding them to the operations/alert repo instead. My bad!

Change 696468 had a related patch set uploaded (by Ema; author: Ema):

[operations/alerts@master] Traffic team alerts

https://gerrit.wikimedia.org/r/696468

Change 696468 merged by Ema:

[operations/alerts@master] Traffic team alerts

https://gerrit.wikimedia.org/r/696468

Change 697710 had a related patch set uploaded (by Ema; author: Ema):

[operations/alerts@master] Netops team alert: ping offload

https://gerrit.wikimedia.org/r/697710

Change 697721 had a related patch set uploaded (by Filippo Giunchedi; author: Filippo Giunchedi):

[operations/puppet@production] alertmanager: attach runbook/dashboard URLs to IRC messages

https://gerrit.wikimedia.org/r/697721

Change 697722 had a related patch set uploaded (by Filippo Giunchedi; author: Filippo Giunchedi):

[operations/puppet@production] alertmanager: add a sample JSON alert and instruction on how to test IRC format changes

https://gerrit.wikimedia.org/r/697722

Change 697737 had a related patch set uploaded (by Filippo Giunchedi; author: Filippo Giunchedi):

[operations/puppet@production] alerts: reload prometheus instances after deploy

https://gerrit.wikimedia.org/r/697737

Change 697721 merged by Filippo Giunchedi:

[operations/puppet@production] alertmanager: attach runbook/dashboard URLs to IRC messages

https://gerrit.wikimedia.org/r/697721

Change 697722 merged by Filippo Giunchedi:

[operations/puppet@production] alertmanager: add a sample alert and test instructions

https://gerrit.wikimedia.org/r/697722

Change 697737 merged by Filippo Giunchedi:

[operations/puppet@production] alerts: reload prometheus instances after deploy

https://gerrit.wikimedia.org/r/697737

Change 697924 had a related patch set uploaded (by Filippo Giunchedi; author: Filippo Giunchedi):

[operations/puppet@production] alertmanager: highlight 'instance' label in alerts dashboard

https://gerrit.wikimedia.org/r/697924

Change 697924 merged by Filippo Giunchedi:

[operations/puppet@production] alertmanager: highlight 'instance' label in alerts dashboard

https://gerrit.wikimedia.org/r/697924

Change 698459 had a related patch set uploaded (by Filippo Giunchedi; author: Filippo Giunchedi):

[operations/puppet@production] alertmanager: print link separators on IRC when needed

https://gerrit.wikimedia.org/r/698459

Change 698491 had a related patch set uploaded (by Ema; author: Ema):

[operations/puppet@production] alertmanager: define IRC and page routes for sre team

https://gerrit.wikimedia.org/r/698491

Change 698491 merged by Ema:

[operations/puppet@production] alertmanager: define IRC and page routes for sre team

https://gerrit.wikimedia.org/r/698491

Change 697710 merged by Ema:

[operations/alerts@master] Netops team alert: ping offload

https://gerrit.wikimedia.org/r/697710

Change 698548 had a related patch set uploaded (by Filippo Giunchedi; author: Filippo Giunchedi):

[operations/alerts@master] pipeline: use bullseye to get newer prometheus

https://gerrit.wikimedia.org/r/698548

Change 698548 merged by Filippo Giunchedi:

[operations/alerts@master] pipeline: use bullseye to get newer prometheus

https://gerrit.wikimedia.org/r/698548

Change 698459 merged by Filippo Giunchedi:

[operations/puppet@production] alertmanager: print link separators on IRC when needed

https://gerrit.wikimedia.org/r/698459

Change 700649 had a related patch set uploaded (by Ayounsi; author: XioNoX):

[operations/alerts@master] Move RPKI alerts to Prometheus/AM

https://gerrit.wikimedia.org/r/700649

Change 700649 merged by Ayounsi:

[operations/alerts@master] Move RPKI alerts to Prometheus/AM

https://gerrit.wikimedia.org/r/700649

Change 702688 had a related patch set uploaded (by Ayounsi; author: Ayounsi):

[operations/puppet@production] Remove old RPKI Grafana alerts

https://gerrit.wikimedia.org/r/702688

Change 702688 merged by Ayounsi:

[operations/puppet@production] Remove old RPKI Grafana alerts

https://gerrit.wikimedia.org/r/702688

Change 708081 had a related patch set uploaded (by Filippo Giunchedi; author: Filippo Giunchedi):

[operations/puppet@production] icinga: remove grafana alerts for Traffic, moved to alertmanager

https://gerrit.wikimedia.org/r/708081

Change 708081 merged by Filippo Giunchedi:

[operations/puppet@production] icinga: remove grafana alerts for Traffic, moved to alertmanager

https://gerrit.wikimedia.org/r/708081

fgiunchedi claimed this task.

This is complete! Thanks all