Page MenuHomePhabricator

Export useful metrics from haproxy logs for Thumbor
Closed, ResolvedPublic

Description

The prometheus-haproxy-exporter exports avg response time per 1024 successful connections, which is not very helpful. We need to export more meaningful metrics using haproxy's logs.

Event Timeline

jijiki created this task.Apr 9 2019, 12:22 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptApr 9 2019, 12:22 PM
Gilles claimed this task.Apr 10 2019, 3:14 PM
Gilles triaged this task as Normal priority.

Change 502967 had a related patch set uploaded (by Gilles; owner: Gilles):
[mediawiki/vagrant@master] Expose haproxy total request time via mtail

https://gerrit.wikimedia.org/r/502967

Change 502967 merged by jenkins-bot:
[mediawiki/vagrant@master] Expose haproxy total request time via mtail

https://gerrit.wikimedia.org/r/502967

Change 502972 had a related patch set uploaded (by Gilles; owner: Gilles):
[operations/puppet@production] Expose haproxy total request time via mtail

https://gerrit.wikimedia.org/r/502972

Change 502972 merged by Effie Mouzeli:
[operations/puppet@production] haproxy: improve metrics (via mtail) and logging

https://gerrit.wikimedia.org/r/502972

Change 504284 had a related patch set uploaded (by Effie Mouzeli; owner: Effie Mouzeli):
[operations/puppet@production] thumbor: enable haproxy mtail metrics

https://gerrit.wikimedia.org/r/504284

Change 504323 had a related patch set uploaded (by Gilles; owner: Gilles):
[operations/puppet@production] Add tests for haproxy mtail program

https://gerrit.wikimedia.org/r/504323

Change 504284 merged by Effie Mouzeli:
[operations/puppet@production] thumbor: enable haproxy mtail metrics

https://gerrit.wikimedia.org/r/504284

Change 504323 merged by Effie Mouzeli:
[operations/puppet@production] Add tests for haproxy mtail program

https://gerrit.wikimedia.org/r/504323

Change 504335 had a related patch set uploaded (by Effie Mouzeli; owner: Effie Mouzeli):
[operations/puppet@production] thumbor: Fix mtail group and log path

https://gerrit.wikimedia.org/r/504335

Change 504335 merged by Effie Mouzeli:
[operations/puppet@production] thumbor: Fix mtail group and log path

https://gerrit.wikimedia.org/r/504335

Gilles reassigned this task from Gilles to jijiki.Apr 17 2019, 8:42 AM
Gilles removed a project: Patch-For-Review.

I'm not seeing the metrics show up in the "eqiad/prometheus" ops datasource in Grafana. I'm not sure how prometheus is supposed to be configured to collect the data from the thumbor hosts, though.

Change 504924 had a related patch set uploaded (by Effie Mouzeli; owner: Effie Mouzeli):
[operations/puppet@production] prometheus: Fix haproxy mtail stats for thumbor

https://gerrit.wikimedia.org/r/504924

Change 504924 merged by Effie Mouzeli:
[operations/puppet@production] prometheus: Fix haproxy mtail stats for thumbor

https://gerrit.wikimedia.org/r/504924

Change 504978 had a related patch set uploaded (by Effie Mouzeli; owner: Effie Mouzeli):
[operations/puppet@production] thumbor: Inlcude mtail in ferm configuration

https://gerrit.wikimedia.org/r/504978

Change 504978 merged by Effie Mouzeli:
[operations/puppet@production] thumbor: Inlcude mtail in ferm configuration

https://gerrit.wikimedia.org/r/504978

@Gilles This is fixed now, I will though revert back to nginx for the weekend. We do have data we can work with from today.

I've added the relevant panels to your dashboard, mirroring the data we were tracking for nginx:

@Gilles thank you! I added the relevant codfw ones

Gilles closed this task as Resolved.Apr 22 2019, 5:21 PM