Page MenuHomePhabricator

Send logstash service metrics to prometheus
Closed, ResolvedPublic

Description

In order to better monitor the logstash service itself let's export logstash metrics to prometheus.

Event Timeline

herron triaged this task as Medium priority.Jul 25 2018, 6:51 PM
herron created this task.

Spent some time setting up and experimenting with logstash_exporter (https://github.com/BonnierNews/logstash_exporter)

It's now up and running for testing on host logging-jessie01.logging.eqiad.wmflabs. Here are the metrics it provides: P7387

FWIW there is also a testing prometheus instance on host keith-prometh.logging.eqiad.wmflabs which is scraping this as well

Also set up prometheus-logstash-exporter (https://gitlab.com/alxrem/prometheus-logstash-exporter). Example metrics: P7394

At a quick glance prometheus-logstash-exporter appears to produce fewer metrics, but afaict metrics are added on the fly according to activity. For instance when logstash is first started 116 metrics are available, then after generating log input activity increases to 149. With the difference being counters like logstash_pipeline_plugins_filters_events_duration_in_millis, logstash_pipeline_plugins_filters_events_out and logstash_pipeline_plugins_filters_events_in that would be empty before log input occurs.

Both projects have last commit ~6 months ago. Prometheus-logstash-exporter has the benefit of an existing Stretch package.

Change 449283 had a related patch set uploaded (by Herron; owner: Herron):
[operations/puppet@production] WIP: prometheus: add logstash exporter and gather logstash metrics

https://gerrit.wikimedia.org/r/449283

I took a look at both metrics and it seems https://github.com/BonnierNews/logstash_exporter metrics are more Prometheus-idiomatic (e.g. metric naming, usage of tags) so I think we should go for that.

There's some metrics missing though that I think it'd be nice (and not a blocker) to have:

logstash_pipeline_reloads_failures 0
logstash_pipeline_reloads_successes 0
logstash_reloads_failures 0
logstash_reloads_successes 0
logstash_up 1

Change 449189 had a related patch set uploaded (by Filippo Giunchedi; owner: Filippo Giunchedi):
[operations/puppet@production] logstash: add 'id' to logstash::input

https://gerrit.wikimedia.org/r/449189

We'll need to add jmx_exporter to Logstash too, to get JVM stats like most other JVMs on the fleet.

I took a look at both metrics and it seems https://github.com/BonnierNews/logstash_exporter metrics are more Prometheus-idiomatic (e.g. metric naming, usage of tags) so I think we should go for that.

There's some metrics missing though that I think it'd be nice (and not a blocker) to have:

up metric: https://github.com/BonnierNews/logstash_exporter/issues/13
reloads metrics: https://github.com/BonnierNews/logstash_exporter/issues/26

Change 450555 had a related patch set uploaded (by Gehel; owner: Gehel):
[operations/puppet@production] logstash: enable GC logs

https://gerrit.wikimedia.org/r/450555

Change 450555 merged by Gehel:
[operations/puppet@production] logstash: enable GC logs

https://gerrit.wikimedia.org/r/450555

Change 450637 had a related patch set uploaded (by Herron; owner: Herron):
[operations/debs/prometheus-logstash-exporter@master] initial import of prometheus-logstash-exporter-0.1.2

https://gerrit.wikimedia.org/r/450637

Change 450637 merged by Herron:
[operations/debs/prometheus-logstash-exporter@master] initial import of prometheus-logstash-exporter-0.1.2

https://gerrit.wikimedia.org/r/450637

Change 451018 had a related patch set uploaded (by Filippo Giunchedi; owner: Filippo Giunchedi):
[operations/puppet@production] logstash: add jmx_exporter

https://gerrit.wikimedia.org/r/451018

prometheus-logstash-exporter_0.1.2-1 has been uploaded to apt.wikimedia.org/jessie-wikimedia/main and apt.wikimedia.org/stretch-wikimedia/main

Change 451018 merged by Filippo Giunchedi:
[operations/puppet@production] logstash: add jmx_exporter

https://gerrit.wikimedia.org/r/451018

Change 451238 had a related patch set uploaded (by Filippo Giunchedi; owner: Filippo Giunchedi):
[operations/puppet@production] prometheus: add logstash jmx_exporter job

https://gerrit.wikimedia.org/r/451238

Change 451238 merged by Filippo Giunchedi:
[operations/puppet@production] prometheus: add logstash jmx_exporter job

https://gerrit.wikimedia.org/r/451238

Change 449283 merged by Herron:
[operations/puppet@production] prometheus: add logstash exporter and gather logstash metrics

https://gerrit.wikimedia.org/r/449283

Change 449189 merged by Filippo Giunchedi:
[operations/puppet@production] logstash: add 'id' to inputs configuration

https://gerrit.wikimedia.org/r/449189

Change 451018 merged by Filippo Giunchedi:
[operations/puppet@production] logstash: add jmx_exporter

https://gerrit.wikimedia.org/r/451018

Sadly this broke invoking logstash and logstash-plugin from the command line because jmx_exporter tries to bind to its port again:

root@logstash1008:~# /usr/share/logstash/bin/logstash-plugin 
Exception in thread "main" java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at sun.instrument.InstrumentationImpl.loadClassAndStartAgent(InstrumentationImpl.java:386)
	at sun.instrument.InstrumentationImpl.loadClassAndCallPremain(InstrumentationImpl.java:401)
Caused by: java.net.BindException: Address already in use
	at sun.nio.ch.Net.bind0(Native Method)
	at sun.nio.ch.Net.bind(Net.java:433)
	at sun.nio.ch.Net.bind(Net.java:425)
	at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
	at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
	at sun.net.httpserver.ServerImpl.bind(ServerImpl.java:133)
	at sun.net.httpserver.HttpServerImpl.bind(HttpServerImpl.java:54)
	at io.prometheus.jmx.shaded.io.prometheus.client.exporter.HTTPServer.<init>(HTTPServer.java:145)
	at io.prometheus.jmx.shaded.io.prometheus.jmx.JavaAgent.premain(JavaAgent.java:49)
	... 6 more
FATAL ERROR in native method: processing of -javaagent failed
Aborted

Change 452747 had a related patch set uploaded (by Filippo Giunchedi; owner: Filippo Giunchedi):
[operations/puppet@production] logstash: use /etc/default/logstash to add jmx_exporter

https://gerrit.wikimedia.org/r/452747

Change 452747 merged by Filippo Giunchedi:
[operations/puppet@production] logstash: use /etc/default/logstash to add jmx_exporter

https://gerrit.wikimedia.org/r/452747

Change 452845 had a related patch set uploaded (by Filippo Giunchedi; owner: Filippo Giunchedi):
[operations/puppet@production] logstash: fix /etc/default/logstash

https://gerrit.wikimedia.org/r/452845

Change 452845 merged by Filippo Giunchedi:
[operations/puppet@production] logstash: fix /etc/default/logstash

https://gerrit.wikimedia.org/r/452845

Change 455520 had a related patch set uploaded (by Filippo Giunchedi; owner: Filippo Giunchedi):
[operations/puppet@production] logstash: add plugin_id to outputs

https://gerrit.wikimedia.org/r/455520

Change 455520 merged by Filippo Giunchedi:
[operations/puppet@production] logstash: add plugin_id to outputs

https://gerrit.wikimedia.org/r/455520

herron claimed this task.