Page MenuHomePhabricator

Test making thumbor statsd metrics available from Prometheus
Closed, ResolvedPublic

Description

As outlined at https://wikitech.wikimedia.org/wiki/Prometheus#Statsd one solution to make statsd metrics available from Prometheus is to use https://github.com/prometheus/statsd_exporter plus a mapping file to translate statsd names to prometheus key=value pairs.

As a test of this approach I've modified statsd_exporter to relay received udp packets upstream with something like P4064 and tested a mapping for thumbor on deployment-imagescaler01. Statsd_exporter listens on tcp/9125 and relays to statsite on tcp/8125, thumbor has been changed to use statsd port 9125 instead of 8125.

The mapping for thumbor looks like this:

thumbor.*.result_storage.incoming_time
name="thumbor_result_storage_receive_milliseconds"

thumbor.*.result_storage.outgoing_time
name="thumbor_result_storage_transmit_milliseconds"

thumbor.*.result_storage.bytes_read
name="thumbor_result_storage_receive_bytes"

thumbor.*.result_storage.bytes_written
name="thumbor_result_storage_transmit_bytes"

thumbor.*.result_storage.miss
name="thumbor_result_storage_requests_total"
outcome="miss"

thumbor.*.result_storage.hit
name="thumbor_result_storage_requests_total"
outcome="hit"


thumbor.*.storage.miss
name="thumbor_storage_miss_total"

thumbor.*.storage.hit
name="thumbor_storage_hit_total"


thumbor.*.original_image.status.*
name="thumbor_http_loader_requests_total"
status="$2"


thumbor.*.engine.processing_time.*.engine.*
name="thumbor_engine_elapsed_milliseconds"
engine="$3"

thumbor.*.engine.processing_utime.*.engine.*
name="thumbor_engine_cpu_milliseconds"
engine="$3"

yielding Prometheus metrics such as:

$ curl localhost:9102/metrics -s | grep -v '^#' | grep thumbor
thumbor_engine_cpu_milliseconds{engine="imagemagick",quantile="0.5"} 28
thumbor_engine_cpu_milliseconds{engine="imagemagick",quantile="0.9"} 32
thumbor_engine_cpu_milliseconds{engine="imagemagick",quantile="0.99"} 32
thumbor_engine_cpu_milliseconds_sum{engine="imagemagick"} 100
thumbor_engine_cpu_milliseconds_count{engine="imagemagick"} 3
thumbor_engine_elapsed_milliseconds{engine="imagemagick",quantile="0.5"} 35
thumbor_engine_elapsed_milliseconds{engine="imagemagick",quantile="0.9"} 38
thumbor_engine_elapsed_milliseconds{engine="imagemagick",quantile="0.99"} 38
thumbor_engine_elapsed_milliseconds_sum{engine="imagemagick"} 114
thumbor_engine_elapsed_milliseconds_count{engine="imagemagick"} 3
thumbor_http_loader_requests_total{status="200"} 4
thumbor_result_storage_receive_milliseconds{quantile="0.5"} 10
thumbor_result_storage_receive_milliseconds{quantile="0.9"} 16
thumbor_result_storage_receive_milliseconds{quantile="0.99"} 16
thumbor_result_storage_receive_milliseconds_sum 53
thumbor_result_storage_receive_milliseconds_count 4
thumbor_result_storage_requests_total{outcome="miss"} 4
thumbor_storage_miss_total 4