Page MenuHomePhabricator

Aggregated metrics for ats-tls <-> clients ttfb percentiles
Closed, ResolvedPublic

Description

We have metrics for ats-tls TTFB for individual instances, but not yet aggregated per cluster/site. The end goal is to expose such metrics in "frontend traffic"-like dashboards and thus observe aggregated TTFB at the edge.

Relevant metrics:

trafficserver_tls_client_ttfb_bucket
trafficserver_tls_client_ttfb_sum
trafficserver_tls_client_ttfb_count

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Change 629430 had a related patch set uploaded (by Filippo Giunchedi; owner: Filippo Giunchedi):
[operations/puppet@production] prometheus: aggregation rules for ats-tls client TTFB

https://gerrit.wikimedia.org/r/629430

crusnov triaged this task as Medium priority.

Change 629430 merged by Filippo Giunchedi:
[operations/puppet@production] prometheus: aggregation rules for ats-tls client TTFB

https://gerrit.wikimedia.org/r/629430

Added a panel to https://grafana.wikimedia.org/d/000000479/frontend-traffic to showcase the top p95 offenders:

2020-10-01-130213_1203x320_scrot.png (320×1 px, 87 KB)

I'm sure there is optimization/tweaking to do, let me know what you think!

Change 632190 had a related patch set uploaded (by Filippo Giunchedi; owner: Filippo Giunchedi):
[operations/puppet@production] prometheus: add 50 percentile for ats-tls TTFB

https://gerrit.wikimedia.org/r/632190

Change 632190 merged by Filippo Giunchedi:
[operations/puppet@production] prometheus: add 50 percentile for ats-tls TTFB

https://gerrit.wikimedia.org/r/632190

With 50 percentile added I'm considering this closed!

As a demo/playground I've started https://grafana.wikimedia.org/d/fD5qC_KGz/filippo-ats-latency for anyone interested