Page MenuHomePhabricator

Move (or delete?) trafficserver restart count alert from icinga to alerts.git
Open, Needs TriagePublic

Description

We have this alert in icinga that can be ported (or deleted, if no longer relevant) to alerts.git:

+    monitoring::check_prometheus { "trafficserver_${instance_name}_restart_count":
+        description     => "traffic_server ${instance_name} process restarted",
+        dashboard_links => ["https://grafana.wikimedia.org/d/000000610/ats-instance-drilldown?orgId=1&var-site=${::site} prometheus/ops&var-instance=${::hostname}&var-layer=${instance_name}"],
+        query           => "scalar(trafficserver_restart_count{${prometheus_labels}})",
+        method          => 'ge',
+        warning         => 2,
+        critical        => 2,
+        prometheus_url  => "http://prometheus.svc.${::site}.wmnet/ops",
+        notes_link      => 'https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server',
+    }