Page MenuHomePhabricator

Port nutcracker statistics to Prometheus
Closed, ResolvedPublic

Description

We'll need to replace our custom diamond exporter for nutcracker with a prometheus exporter.

I took a look at the three I could find on github:

Event Timeline

I tried to package https://github.com/xavierholt/twemproxy_exporter and it would work in buster/sid but not stretch/jessie because these packages are missing:

  1. ruby-prometheus-client-mmap
  2. ruby-mmap2 (unclear yet how hard it'd be to backport to jessie since ruby 2.1 is there, not ruby 2.3 like >= stretch)

I tried backporting ruby-mmap2 to jessie, even after disabling the failing tests I'm getting a segmentation fault around mmap values

$ prometheus_multiproc_dir=/tmp twemproxy_exporter  localhost:22222
[2017-12-15 14:19:24] INFO  WEBrick 1.3.1
[2017-12-15 14:19:24] INFO  ruby 2.1.5 (2014-11-13) [x86_64-linux-gnu]
[2017-12-15 14:19:24] INFO  WEBrick::HTTPServer#start: pid=11580 port=9222
/usr/lib/ruby/vendor_ruby/prometheus/client/mmaped_value.rb:102: [BUG] Segmentation fault at 0x007f1c66e7aff8
ruby 2.1.5p273 (2014-11-13) [x86_64-linux-gnu]

-- Control frame information -----------------------------------------------
Segmentation fault

Another option might be to package the regular prometheus ruby client, as opposed to the mmap fork

Change 398505 had a related patch set uploaded (by Filippo Giunchedi; owner: Filippo Giunchedi):
[operations/debs/prometheus-nutcracker-exporter@master] First version

https://gerrit.wikimedia.org/r/398505

Change 398505 merged by Filippo Giunchedi:
[operations/debs/prometheus-nutcracker-exporter@master] First version

https://gerrit.wikimedia.org/r/398505

Change 398839 had a related patch set uploaded (by Filippo Giunchedi; owner: Filippo Giunchedi):
[operations/debs/prometheus-nutcracker-exporter@master] Add debian/ and .gitreview

https://gerrit.wikimedia.org/r/398839

Change 398839 merged by Filippo Giunchedi:
[operations/debs/prometheus-nutcracker-exporter@master] Add debian/ and .gitreview

https://gerrit.wikimedia.org/r/398839

Change 398847 had a related patch set uploaded (by Filippo Giunchedi; owner: Filippo Giunchedi):
[operations/puppet@production] Add nutcracker_exporter profile

https://gerrit.wikimedia.org/r/398847

Change 398847 merged by Filippo Giunchedi:
[operations/puppet@production] Add nutcracker_exporter profile

https://gerrit.wikimedia.org/r/398847

Change 399154 had a related patch set uploaded (by Filippo Giunchedi; owner: Filippo Giunchedi):
[operations/debs/prometheus-nutcracker-exporter@master] Fix nutcracker metrics fetching

https://gerrit.wikimedia.org/r/399154

Change 399154 merged by Filippo Giunchedi:
[operations/debs/prometheus-nutcracker-exporter@master] Fix nutcracker metrics fetching

https://gerrit.wikimedia.org/r/399154

Change 399163 had a related patch set uploaded (by Filippo Giunchedi; owner: Filippo Giunchedi):
[operations/puppet@production] prometheus: add nutcracker job

https://gerrit.wikimedia.org/r/399163

Change 399163 merged by Filippo Giunchedi:
[operations/puppet@production] prometheus: add nutcracker job

https://gerrit.wikimedia.org/r/399163

I've updated the nutcracker dashboard at https://grafana.wikimedia.org/dashboard/db/nutcracker?orgId=1 (and moved the graphite one to "nutcracker graphite") cc @Joe @elukey for feedback

All done! The dashboards will likely need some tuning but metrics are there.

chasemp added a subscriber: Andrew.

Tentatively reopening as I'm not sure what's up with this but want to keep this tasks narrative together.

It seems the nutcracker collector on silver is crashing and being restarted on every Puppet run.

Notice: /Stage[main]/Profile::Prometheus::Nutcracker_exporter/Service[prometheus-nutcracker-exporter]/ensure: ensure changed 'stopped' to 'running'
Info: /Stage[main]/Profile::Prometheus::Nutcracker_exporter/Service[prometheus-nutcracker-exporter]: Unscheduling refresh on Service[prometheus-nutcracker-exporter]

silver:~# service prometheus-nutcracker-exporter status
prometheus-nutcracker-exporter stop/waiting

Not sure why. ping @Andrew

Indeed, so the problem is that we were trying to source /etc/default/prometheus-nutcracker-exporter file which wasn't there. I touched the file and will fix the package to ship an empty/sample file instead.

prometh+  4076  0.1  0.0 126948 11204 ?        Ssl  09:53   0:00 /usr/bin/python /usr/bin/prometheus-nutcracker-exporter

Tentatively resolving