Our elasticsearch clusters expose some metrics specific to our us, including per node latency percentiles. Those are not collected by the standard elasticsearch_exporter. We want to create a new custom exporter for those metrics.The prometheus-blazegraph-exporter can be used as an example / starting point. This exporter has no reason to be reused outside of our deployment, so deploying it directly with puppet is probably fine.
I put together a very basic attempt at a first dashboard: https://grafana.wikimedia.org/dashboard/db/elasticsearch-per-node-percentiles?orgId=1
The overall numbers look sane and roughly what is expected.
I suppose the :9109 in the instance names is a bit annoying, in that is makes the list of instances much longer. Really though there are too many instances to list and it needs to be further filtered (top-N?) anyways.