Page MenuHomePhabricator

Per-backend ATS Prometheus metrics
Open, NormalPublic

Description

While investigating T184942: Deprecate python varnish cachestats we've ran into the fact that maps runs on cache upload, and the ATS migration for upload has been completed, and we don't have per-backend metrics (latency, status code, etc) from ATS. Ideally we have at least the same metrics we're collecting from varnishlog + mtail available from ATS, although I don't know the specifics of what's possible.

In terms of dashboards, we're looking at replacing varnish_backend_requests and varnish_backend_timing in dashboards and possibly alerts. The latter is a subset of the former, so we should be able to rewrite _timing in terms of _requests.

varnish_backend_requests

Matched db/api-frontend-summary (API frontend summary)
Matched db/maps-performances-filippo-t184942 (Maps performances Filippo T184942)
Matched db/wikidata-query-service-frontend (Wikidata Query Service Frontend)

varnish_backend_timing

Matched db/apache-backend-timing (Apache Backend-Timing)

Event Timeline

Restricted Application added a project: Operations. · View Herald TranscriptJul 10 2019, 1:56 PM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
ema triaged this task as Normal priority.Jul 10 2019, 2:05 PM
ema moved this task from Triage to Caching on the Traffic board.
CDanis added a subscriber: CDanis.Jul 10 2019, 2:07 PM

Change 523130 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] ATS: log origin server hostname and Backend-Timing

https://gerrit.wikimedia.org/r/523130

Change 523168 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] ATS: add atsbackend.mtail

https://gerrit.wikimedia.org/r/523168

Change 523130 merged by Ema:
[operations/puppet@production] ATS: log origin server hostname and Backend-Timing

https://gerrit.wikimedia.org/r/523130

fgiunchedi moved this task from Backlog to Doing on the User-fgiunchedi board.Jul 16 2019, 10:30 AM

Change 523705 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] ATS: add support for atsmtail systemd services

https://gerrit.wikimedia.org/r/523705

Change 523768 had a related patch set uploaded (by Ema; owner: Ema):
[operations/software/fifo-log-demux@master] 0.3: implement fifo-log-tailer in go

https://gerrit.wikimedia.org/r/523768

Change 523768 merged by Ema:
[operations/software/fifo-log-demux@master] 0.3: implement fifo-log-tailer in go

https://gerrit.wikimedia.org/r/523768

Mentioned in SAL (#wikimedia-operations) [2019-07-17T09:07:44Z] <ema> upload fifo-log-demux 0.3 to stretch-wikimedia T227668

Change 523881 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] ATS: pass -socket and -regexp to fifo-log-tailer

https://gerrit.wikimedia.org/r/523881

Change 523881 merged by Ema:
[operations/puppet@production] ATS: pass -socket and -regexp to fifo-log-tailer

https://gerrit.wikimedia.org/r/523881

Mentioned in SAL (#wikimedia-operations) [2019-07-17T09:21:43Z] <ema> cp-ats: upgrade fifo-log-demux to 0.3 T227668

Change 523705 merged by Ema:
[operations/puppet@production] ATS: add support for atsmtail systemd services

https://gerrit.wikimedia.org/r/523705

Change 523168 merged by Ema:
[operations/puppet@production] ATS: add atsbackend.mtail

https://gerrit.wikimedia.org/r/523168

Change 523898 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] prometheus: fetch ATS origin server metrics

https://gerrit.wikimedia.org/r/523898

fgiunchedi updated the task description. (Show Details)Jul 17 2019, 10:47 AM

Change 523898 merged by Ema:
[operations/puppet@production] prometheus: fetch ATS origin server metrics

https://gerrit.wikimedia.org/r/523898

Mentioned in SAL (#wikimedia-operations) [2019-07-17T13:06:47Z] <ema> prometheus servers: remove varnish-upload_$dc_backend.yaml, replaced by ATS equivalent T227668

Change 525081 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] prometheus: add ats_backend_requests_seconds_count rules

https://gerrit.wikimedia.org/r/525081

Change 525085 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] prometheus: rename trafficserver metrics

https://gerrit.wikimedia.org/r/525085

Change 525085 merged by Ema:
[operations/puppet@production] prometheus: rename trafficserver metrics

https://gerrit.wikimedia.org/r/525085

Change 525081 merged by Ema:
[operations/puppet@production] prometheus: add trafficserver_backend_requests_seconds_count rules

https://gerrit.wikimedia.org/r/525081