Page MenuHomePhabricator

Prometheus metrics for Kartotherian on k8s
Open, Needs TriagePublic

Description

Overview

The Kartotherian helm chart on k8s is now running an extra container to "translate" statsd metrics to Prometheus ones: nodejs is configured to push metrics to localhost:9125, that is the Prometheus exporter collecting statsd metrics and exposing them as Prometheus metrics.

By default the statsd exporter doesn't do a great job in translation, because we see metrics names like:

[.cut.]
kartotherian_req_osm_intl_6_png_bucket{le="+Inf"} 1055
kartotherian_req_osm_intl_6_png_sum 47.51299999999971
kartotherian_req_osm_intl_6_png_count 1055

The proposal is to add a config that allows the following use cases:

kartotherian.req.osm-intl.18.png
kartotherian.req.osm-intl.8.png.static.2
kartotherian.req.osm-intl.9.png.1-5:159

Translated as:

kartotherian_request_ms{kind="osm-int", int="18", format="png"}
kartotherian_request_ms{kind="osm-int", int="8", format="png", static="2"}
kartotherian_request_ms{kind="osm-int", int="9", format="png", zoom="1"}

From graphite (left panel -> Metrics -> kartotherian -> ...) you can see the metrics being collected, to have a broader idea.

There are other metrics like the kartotherian.err ones, but those should be easier to translate, they seem to have a flat structure and not a nested/dynamic one like the kartotherian.req ones.

More info in https://grafana-rw.wikimedia.org/d/000000030/service-kartotherian

Proposal

Writing the prometheus statsd config seems to be the easiest way to put Kartotherian on k8s, move traffic to it and start using it. As a follow up we could also force service-runner to publish Prometheus metrics itself, but it is probably something that requires a bit of time and that could be done later on.

Event Timeline

Change #1105296 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/deployment-charts@master] charts: improve Kartotherian's statsd config (part 2)

https://gerrit.wikimedia.org/r/1105296

This comment was removed by elukey.

The proposed config is in the patch above. I applied it manually in staging and this is the result for the readiness probe:

elukey@kubestage1006:~$ sudo nsenter -t 2495625 -n curl -s localhost:9102/metrics | grep karto
# HELP kartotherian_heap_rss Metric autogenerated by statsd_exporter.
# TYPE kartotherian_heap_rss gauge
kartotherian_heap_rss 1.53858048e+08
# HELP kartotherian_heap_total Metric autogenerated by statsd_exporter.
# TYPE kartotherian_heap_total gauge
kartotherian_heap_total 5.2862976e+07
# HELP kartotherian_heap_used Metric autogenerated by statsd_exporter.
# TYPE kartotherian_heap_used gauge
kartotherian_heap_used 4.911472e+07
# HELP kartotherian_init Metric autogenerated by statsd_exporter.
# TYPE kartotherian_init counter
kartotherian_init 1
# HELP kartotherian_requests_ms Metric autogenerated by statsd_exporter.
# TYPE kartotherian_requests_ms histogram
kartotherian_requests_ms_bucket{format="png",int="6",kind="osm-intl",le="0.005"} 0
kartotherian_requests_ms_bucket{format="png",int="6",kind="osm-intl",le="0.01"} 0
kartotherian_requests_ms_bucket{format="png",int="6",kind="osm-intl",le="0.025"} 0
kartotherian_requests_ms_bucket{format="png",int="6",kind="osm-intl",le="0.05"} 5
kartotherian_requests_ms_bucket{format="png",int="6",kind="osm-intl",le="0.1"} 6
kartotherian_requests_ms_bucket{format="png",int="6",kind="osm-intl",le="0.25"} 6
kartotherian_requests_ms_bucket{format="png",int="6",kind="osm-intl",le="0.5"} 6
kartotherian_requests_ms_bucket{format="png",int="6",kind="osm-intl",le="1"} 6
kartotherian_requests_ms_bucket{format="png",int="6",kind="osm-intl",le="2.5"} 6
kartotherian_requests_ms_bucket{format="png",int="6",kind="osm-intl",le="5"} 6
kartotherian_requests_ms_bucket{format="png",int="6",kind="osm-intl",le="10"} 6
kartotherian_requests_ms_bucket{format="png",int="6",kind="osm-intl",le="30"} 6
kartotherian_requests_ms_bucket{format="png",int="6",kind="osm-intl",le="60"} 6
kartotherian_requests_ms_bucket{format="png",int="6",kind="osm-intl",le="+Inf"} 6
kartotherian_requests_ms_sum{format="png",int="6",kind="osm-intl"} 0.24700000000000003
kartotherian_requests_ms_count{format="png",int="6",kind="osm-intl"} 6

There are probably other use cases, but from a quick test from statsd traffic generated by maps2005 the basic should be handled by the mapping config in the patch. I think that we can go with this one and then review after the firsts tests etc..

The name of the labels can be changed in anything that you prefer, just lemme know it :)

I also found this PR from Cole (in T205870) that may be used as alternative to have native Prometheus metrics: https://gerrit.wikimedia.org/r/c/mediawiki/services/kartotherian/+/556250