Define and deploy Icinga checks for ATS backends
Closed, ResolvedPublic

Description

We need to define and deploy a set of Icinga checks useful in the context of ATS. The checks should verify that:

  • all required processes are running
  • traffic_manager is binding on the right port and serving HTTP requests
  • configuration is valid
  • the daemon needs to be reloaded/restarted because of unapplied configuration changes
ema created this task.Sep 13 2018, 11:33 AM
Restricted Application added a project: Operations. · View Herald TranscriptSep 13 2018, 11:33 AM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
ema triaged this task as Normal priority.Sep 13 2018, 11:33 AM
ema moved this task from Triage to Caching on the Traffic board.

Change 466483 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] ATS: define Icinga checks

https://gerrit.wikimedia.org/r/466483

Change 466483 merged by Ema:
[operations/puppet@production] ATS: define Icinga checks

https://gerrit.wikimedia.org/r/466483

ema updated the task description. (Show Details)Oct 22 2018, 11:18 AM

Change 468971 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] ATS: define icinga check for HTTP responses

https://gerrit.wikimedia.org/r/468971

Change 468971 merged by Ema:
[operations/puppet@production] ATS: define icinga check for HTTP responses

https://gerrit.wikimedia.org/r/468971

ema updated the task description. (Show Details)Oct 22 2018, 2:08 PM

Change 469875 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] ATS: check HTTP responses from prometheus exporter

https://gerrit.wikimedia.org/r/469875

Change 470573 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] prometheus-trafficserver-exporter: executable renamed

https://gerrit.wikimedia.org/r/470573

Change 470573 merged by Ema:
[operations/puppet@production] prometheus-trafficserver-exporter: executable renamed

https://gerrit.wikimedia.org/r/470573

Change 469875 merged by Ema:
[operations/puppet@production] ATS: check HTTP responses from prometheus exporter

https://gerrit.wikimedia.org/r/469875

Change 471940 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] ATS: add check_trafficserver_config_status

https://gerrit.wikimedia.org/r/471940

Change 471940 merged by Ema:
[operations/puppet@production] ATS: add check_trafficserver_config_status

https://gerrit.wikimedia.org/r/471940

Change 471964 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] ATS: run check_trafficserver_config_status as trafficserver user

https://gerrit.wikimedia.org/r/471964

Change 471964 merged by Ema:
[operations/puppet@production] ATS: run check_trafficserver_config_status as trafficserver user

https://gerrit.wikimedia.org/r/471964

Change 471983 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] ATS: allow nagios user to run check_trafficserver_config_status

https://gerrit.wikimedia.org/r/471983

Change 471983 merged by Ema:
[operations/puppet@production] ATS: allow nagios user to run check_trafficserver_config_status

https://gerrit.wikimedia.org/r/471983

ema updated the task description. (Show Details)Nov 6 2018, 4:49 PM
ema changed the task status from Open to Stalled.

Checking whether the configuration is valid is blocked on https://github.com/apache/trafficserver/issues/4466 being fixed upstream.

Change 474151 had a related patch set uploaded (by Ema; owner: Ema):
[operations/debs/trafficserver@master] Add 0009-verify-config-segfault.patch

https://gerrit.wikimedia.org/r/474151

Change 474156 had a related patch set uploaded (by Ema; owner: Ema):
[operations/debs/trafficserver@master] trafficserver (8.0.0-1wm2) stretch-wikimedia; urgency=medium

https://gerrit.wikimedia.org/r/474156

Change 474151 merged by Ema:
[operations/debs/trafficserver@master] Add 0009-verify-config-segfault.patch

https://gerrit.wikimedia.org/r/474151

Change 474156 merged by Ema:
[operations/debs/trafficserver@master] trafficserver (8.0.0-1wm2) stretch-wikimedia; urgency=medium

https://gerrit.wikimedia.org/r/474156

Stashbot added a subscriber: Stashbot.

Mentioned in SAL (#wikimedia-operations) [2018-11-16T14:39:16Z] <ema> trafficserver 8.0.0-1wm2 uploaded to stretch-wikimedia T204225 T204209

Mentioned in SAL (#wikimedia-operations) [2018-11-16T14:56:20Z] <ema> upgrade cp-ats to 8.0.0-1wm2 T204225 T204209

Change 474288 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] ATS: add check_trafficserver_verify_config

https://gerrit.wikimedia.org/r/474288

ema changed the task status from Stalled to Open.Nov 16 2018, 4:32 PM

We fixed the verify_config issue in ATS 8.0.0-1wm2, this is not stalled anymore.

Change 474288 merged by Ema:
[operations/puppet@production] ATS: add check_trafficserver_verify_config

https://gerrit.wikimedia.org/r/474288

Change 474706 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] ATS: more specific traffic_server check definition

https://gerrit.wikimedia.org/r/474706

Change 474706 merged by Ema:
[operations/puppet@production] ATS: more specific traffic_server check definition

https://gerrit.wikimedia.org/r/474706

Change 474709 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] ATS: quote traffic_server check_procs arguments

https://gerrit.wikimedia.org/r/474709

Change 474709 merged by Ema:
[operations/puppet@production] ATS: quote traffic_server check_procs arguments

https://gerrit.wikimedia.org/r/474709

ema closed this task as Resolved.Mon, Nov 19, 4:47 PM
ema updated the task description. (Show Details)

Change 475287 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] ATS: add profile::trafficserver::nrpe_monitor_script

https://gerrit.wikimedia.org/r/475287

Change 475291 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] ATS: add check_trafficserver_log_fifo

https://gerrit.wikimedia.org/r/475291

Change 475287 merged by Ema:
[operations/puppet@production] ATS: add profile::trafficserver::nrpe_monitor_script

https://gerrit.wikimedia.org/r/475287

Change 475291 merged by Ema:
[operations/puppet@production] ATS: add check_trafficserver_log_fifo

https://gerrit.wikimedia.org/r/475291

Change 475294 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] ATS: pass fifo filename to check_trafficserver_log_fifo

https://gerrit.wikimedia.org/r/475294

Change 475294 merged by Ema:
[operations/puppet@production] ATS: pass fifo filename to check_trafficserver_log_fifo

https://gerrit.wikimedia.org/r/475294

Change 478211 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] ATS: check when a restart is required

https://gerrit.wikimedia.org/r/478211

Change 478211 merged by Ema:
[operations/puppet@production] ATS: check when a restart is required

https://gerrit.wikimedia.org/r/478211