Page MenuHomePhabricator

Onboard teams with Grafana alerts to AlertManager
Closed, ResolvedPublic

Description

There are a few teams with Grafana alerts in Icinga at the moment, we should onboard these teams to AlertManager and move the Grafana alerts off Icinga.

The following checks currently use Grafana:

monitor/ores.pp:    monitoring::grafana_alert { 'ores':
monitor/ores.pp:    monitoring::grafana_alert { 'ores-extension':
monitor/reading_web.pp:    monitoring::grafana_alert { 'reading-web-page-previews':
monitor/reading_web.pp:    monitoring::grafana_alert { 'wikimedia-client-errors-alerts':
monitor/services.pp:    monitoring::grafana_alert { 'restbase-legacy':
monitor/services.pp:    monitoring::grafana_alert { 'restbase':
monitor/services.pp:    monitoring::grafana_alert { 'change-propagation':
monitor/services.pp:    monitoring::grafana_alert { 'jobqueue-eventbus':
monitor/traffic.pp:    monitoring::grafana_alert { 'varnish-http-requests':
monitor/traffic.pp:    monitoring::grafana_alert { 'ping-offload':
monitor/traffic.pp:    monitoring::grafana_alert { 'rpki':
monitor/wikidata.pp:    monitoring::grafana_alert { 'wikidata-alerts':
ORES
Reading Web
  • reading-web-page-previews: dashboard. Median TTP alert
  • wikimedia-client-errors-alerts: dashboard. Client errors alert
Services
  • restbase-legacy: dashboard. No alerts AFAICS
  • restbase: dashboard. No alerts AFAICS
  • change-propagation dashboard. No alerts AFAICS
  • jobqueue-eventbus: dashboard not found
Traffic
  • varnish-http-requests: dashboard. Traffic drop alert
  • ping-offload: dashboard. Input IP error rates
  • rpki: dashboard. Misc RPKI / routinator alerts.
Wikidata
  • wikidata-alerts: dashboard. Misc wikidata alerts

Event Timeline

MPhamWMF triaged this task as High priority.Jun 7 2021, 3:43 PM
MPhamWMF moved this task from Incoming to Operations/SRE on the Wikidata-Query-Service board.
TJones renamed this task from Onboard teams with Grafana alerts to AM to Onboard teams with Grafana alerts to AlertManager.Jun 7 2021, 3:43 PM
TJones updated the task description. (Show Details)

Change 698735 had a related patch set uploaded (by Filippo Giunchedi; author: Filippo Giunchedi):

[operations/puppet@production] icinga: update reading web Grafana alerts

https://gerrit.wikimedia.org/r/698735

Change 698735 merged by Filippo Giunchedi:

[operations/puppet@production] icinga: update reading web Grafana alerts

https://gerrit.wikimedia.org/r/698735

Change 708369 had a related patch set uploaded (by Jdlrobson; author: Jdlrobson):

[operations/puppet@production] alertmanager: route readers web team alerts

https://gerrit.wikimedia.org/r/708369

Change 708369 merged by Filippo Giunchedi:

[operations/puppet@production] alertmanager: route readers web team alerts

https://gerrit.wikimedia.org/r/708369

Change 708476 had a related patch set uploaded (by Filippo Giunchedi; author: Filippo Giunchedi):

[operations/puppet@production] icinga: remove services Grafana alerts

https://gerrit.wikimedia.org/r/708476

Change 708476 merged by Filippo Giunchedi:

[operations/puppet@production] icinga: remove services Grafana alerts

https://gerrit.wikimedia.org/r/708476

Change 708719 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/puppet@production] Remove Grafana alerts for ORES

https://gerrit.wikimedia.org/r/708719

Change 708719 merged by Elukey:

[operations/puppet@production] Remove Grafana alerts for ORES

https://gerrit.wikimedia.org/r/708719

Change 714067 had a related patch set uploaded (by Filippo Giunchedi; author: Filippo Giunchedi):

[operations/puppet@production] icinga: remove reading-web Grafana checks

https://gerrit.wikimedia.org/r/714067

Change 714067 merged by Filippo Giunchedi:

[operations/puppet@production] icinga: remove reading-web Grafana checks

https://gerrit.wikimedia.org/r/714067

Change 719107 had a related patch set uploaded (by Filippo Giunchedi; author: Filippo Giunchedi):

[operations/puppet@production] icinga: remove check_grafana_alert

https://gerrit.wikimedia.org/r/719107

Change 719107 merged by Filippo Giunchedi:

[operations/puppet@production] icinga: remove check_grafana_alert

https://gerrit.wikimedia.org/r/719107