Page MenuHomePhabricator

Port memcached statistics from ganglia to prometheus
Closed, ResolvedPublic

Description

Tried memcached_exporter (https://github.com/prometheus/memcached_exporter) on deployment-memc04.eqiad.wmflabs:

filippo@deployment-memc05:~$ curl -s localhost:9150/metrics | grep ^memca
memcached_commands_total{command="cas",status="badval"} 2262
memcached_commands_total{command="cas",status="hit"} 1.936389e+06
memcached_commands_total{command="cas",status="miss"} 21
memcached_commands_total{command="decr",status="hit"} 0
memcached_commands_total{command="decr",status="miss"} 0
memcached_commands_total{command="delete",status="hit"} 464711
memcached_commands_total{command="delete",status="miss"} 3771
memcached_commands_total{command="flush",status="hit"} 0
memcached_commands_total{command="get",status="hit"} 1.86302483e+08
memcached_commands_total{command="get",status="miss"} 1.57966743e+08
memcached_commands_total{command="incr",status="hit"} 116514
memcached_commands_total{command="incr",status="miss"} 204
memcached_commands_total{command="set",status="hit"} 8.488741e+06
memcached_commands_total{command="touch",status="hit"} 180469
memcached_commands_total{command="touch",status="miss"} 0
memcached_connections_total 197148
memcached_current_bytes 8.48931078e+08
memcached_current_connections 16
memcached_current_items 1.395356e+06
memcached_items_evicted_total 0
memcached_items_reclaimed_total 46462
memcached_items_total 1.0243271e+07
memcached_limit_bytes 3.145728e+09
memcached_read_bytes_total 2.9120185857e+10
memcached_up 1
memcached_uptime_seconds 1.1571958e+07
memcached_version{version="1.4.21"} 1
memcached_written_bytes_total 9.104556614e+10

Details

Related Gerrit Patches:
operations/puppet : productionwmcs: add prometheus-memcached-exporter
operations/puppet : productionwmcs: add prometheus-memcached-exporter
operations/puppet : productionswift: add prometheus-memcached-exporter
operations/puppet : productionthumbor: add prometheus-memcached-exporter
operations/puppet : productionrole: include memcached_exporter in role::memcached
operations/puppet : productionrole: add Prometheus job for memcached_exporter
operations/puppet : productionrole: account for labs in memcached_exporter
operations/puppet : productionprometheus: add memcached_exporter

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptOct 4 2016, 4:05 PM

Mentioned in SAL (#wikimedia-operations) [2016-11-08T20:55:41Z] <godog> upload prometheus-memcached-exporter 0.3.0+ds1-1 to carbon - T147326

Change 320702 had a related patch set uploaded (by Filippo Giunchedi):
prometheus: add memcached_exporter

https://gerrit.wikimedia.org/r/320702

Change 321568 had a related patch set uploaded (by Filippo Giunchedi):
role: add Prometheus job for memcached_exporter

https://gerrit.wikimedia.org/r/321568

Change 320702 merged by Filippo Giunchedi:
prometheus: add memcached_exporter

https://gerrit.wikimedia.org/r/320702

Change 321717 had a related patch set uploaded (by Filippo Giunchedi):
role: account for labs in memcached_exporter

https://gerrit.wikimedia.org/r/321717

Change 321717 merged by Filippo Giunchedi:
role: account for labs in memcached_exporter

https://gerrit.wikimedia.org/r/321717

Change 321568 merged by Filippo Giunchedi:
role: add Prometheus job for memcached_exporter

https://gerrit.wikimedia.org/r/321568

Change 321725 had a related patch set uploaded (by Filippo Giunchedi):
role: include memcached_exporter in role::memcached

https://gerrit.wikimedia.org/r/321725

Change 321725 merged by Filippo Giunchedi:
role: include memcached_exporter in role::memcached

https://gerrit.wikimedia.org/r/321725

This is rolling out now, I noticed there's a big number of metrics related to slabs (i.e. per-slab, and per-command/per-slab in case of commands)

mc2001:~$ curl localhost:9150/metrics -s | grep '^memcached_slab' | cut -d{ -f1 | sort -u
memcached_slab_chunk_size_bytes
memcached_slab_chunks_free
memcached_slab_chunks_free_end
memcached_slab_chunks_per_page
memcached_slab_chunks_used
memcached_slab_commands_total
memcached_slab_current_chunks
memcached_slab_current_items
memcached_slab_current_pages
memcached_slab_items_age_seconds
memcached_slab_items_crawler_reclaimed_total
memcached_slab_items_evicted_nonzero_total
memcached_slab_items_evicted_time_seconds
memcached_slab_items_evicted_total
memcached_slab_items_evicted_unfetched_total
memcached_slab_items_expired_unfetched_total
memcached_slab_items_outofmemory_total
memcached_slab_items_reclaimed_total
memcached_slab_items_tailrepairs_total
memcached_slab_mem_requested_bytes

I don't think this is a problem ATM in terms of number of metrics but is there something we could discard here @elukey @Joe ?

I tried to add some metrics in https://grafana.wikimedia.org/dashboard/db/prometheus-memcached-dc-stats (still a draft). The idea is to have aggregated metrics, and then a selector to dive into single host data (I might want to see the slabs usage of mc1001). This is only a proposal of course, let's discuss what is best.

About the number of metrics: I absolutely love the fact that we have per slab metrics, I would keep everything for the moment until we decide what it is not really useful. If the metrics are too much though we can come up with a blacklist :(

About the memcached versions: we are currently running two versions of memcached: 1.4.21 (Debian Jessie) and 1.4.25 (mc1009 and mc1010). I added a section on the dashboard for 1.4.25 specific metric, but from a first review we might need to follow up with upstream to add more things to the exporter. Not really urgent but I wanted to bring it up :)

Thanks a lot for all the work Filippo!

elukey moved this task from Backlog to In Progress on the User-Elukey board.Dec 23 2016, 3:12 PM
fgiunchedi closed this task as Resolved.Jan 4 2017, 12:40 AM
fgiunchedi claimed this task.

@elukey ok! we can keep the metrics for now, if it turns out to be a problem we can blacklist those later

Change 431593 had a related patch set uploaded (by Filippo Giunchedi; owner: Filippo Giunchedi):
[operations/puppet@production] swift: add prometheus-memcached-exporter

https://gerrit.wikimedia.org/r/431593

Change 431594 had a related patch set uploaded (by Filippo Giunchedi; owner: Filippo Giunchedi):
[operations/puppet@production] thumbor: add prometheus-memcached-exporter

https://gerrit.wikimedia.org/r/431594

Change 431595 had a related patch set uploaded (by Filippo Giunchedi; owner: Filippo Giunchedi):
[operations/puppet@production] striker: add prometheus-memcached-exporter

https://gerrit.wikimedia.org/r/431595

Change 431594 merged by Filippo Giunchedi:
[operations/puppet@production] thumbor: add prometheus-memcached-exporter

https://gerrit.wikimedia.org/r/431594

Change 431593 merged by Filippo Giunchedi:
[operations/puppet@production] swift: add prometheus-memcached-exporter

https://gerrit.wikimedia.org/r/431593

Change 431595 merged by Filippo Giunchedi:
[operations/puppet@production] wmcs: add prometheus-memcached-exporter

https://gerrit.wikimedia.org/r/431595

Change 477620 had a related patch set uploaded (by Cwhite; owner: Cwhite):
[operations/puppet@production] wmcs: add prometheus-memcached-exporter

https://gerrit.wikimedia.org/r/477620

Change 477620 merged by GTirloni:
[operations/puppet@production] wmcs: add prometheus-memcached-exporter

https://gerrit.wikimedia.org/r/477620