Page MenuHomePhabricator

Fix prometheus elasticsearch exporter to show all the metrics
Closed, ResolvedPublic

Description

When trying to resolve this T209812 I discovered that the required metrics have been exposed by the prometheus exporter but they are not showing when I tunnel to the prometheus port. Something is probably wrong somewhere.
Original ticket upstream: https://github.com/justwatchcom/elasticsearch_exporter/issues/115
New ticket: https://github.com/justwatchcom/elasticsearch_exporter/issues/199

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

After several investigation as to why we are not seeing some already exposed metrics, I detected that we have to run the prometheus exporter with a particular option -es.indices=true that exposes the indices for us to see the missing metrics. As this will allow us properly migrate from graphite to prometheus.
However, using this option comes with an overhead as it has to wait for the _stats api to complete at every request. There's an option of including -es.timeout to reduce the overhead though.
I recommend trying with these options on relforge and monitor the utilization across the nodes before going live on this.
However, another issue is these will generate quite some data which might bring other issues.
Any suggestion is welcome to move these forward.

There should be no need to fetch indices stats for this I think there was a misunderstanding in https://github.com/justwatchcom/elasticsearch_exporter/issues/115, the maintainer thought we wanted fine grained index statistics but we want node stats.
the /_nodes/_local/stats API endpoint should have all what we need.
Group stats are available when calling /_nodes/_local/stats?groups=_all
After a quick look it does not seem that the exporter is doing what we need.

Change 483143 had a related patch set uploaded (by Mathew.onipe; owner: Mathew.onipe):
[operations/debs/prometheus-elasticsearch-exporter@master] New upstream version

https://gerrit.wikimedia.org/r/483143

Change 483143 abandoned by Mathew.onipe:
New upstream version

Reason:
not needed

https://gerrit.wikimedia.org/r/483143

Change 483492 had a related patch set uploaded (by Mathew.onipe; owner: Mathew.onipe):
[operations/debs/prometheus-elasticsearch-exporter@master] Updated changelog

https://gerrit.wikimedia.org/r/483492

Change 483492 merged by Mathew.onipe:
[operations/debs/prometheus-elasticsearch-exporter@master] Updated changelog

https://gerrit.wikimedia.org/r/483492

New .deb is available on https://people.wikimedia.org/~gehel/prometheus-elasticsearch-exporter/

@Mathew.onipe : I'll let you validate it before uploading to our apt repo

Validation was done by @Mathew.onipe.

.deb is now uploaded to our apt repo

Change 484243 had a related patch set uploaded (by Mathew.onipe; owner: Mathew.onipe):
[operations/puppet@production] elasticsearch: mask default exporter service

https://gerrit.wikimedia.org/r/484243

Change 484243 merged by Gehel:
[operations/puppet@production] elasticsearch: mask default exporter service

https://gerrit.wikimedia.org/r/484243

Change 484389 had a related patch set uploaded (by Mathew.onipe; owner: Mathew.onipe):
[operations/puppet@production] elasticsearch: mask default exporter

https://gerrit.wikimedia.org/r/484389

Change 484389 merged by Gehel:
[operations/puppet@production] elasticsearch: mask default exporter

https://gerrit.wikimedia.org/r/484389

Mentioned in SAL (#wikimedia-operations) [2019-01-15T12:03:12Z] <onimisionipe> starting upgrading of prometheus-elasticsearch-exporter for codfw T210592

Mentioned in SAL (#wikimedia-operations) [2019-01-15T12:15:14Z] <onimisionipe> starting upgrading of prometheus-elasticsearch-exporter for eqiad T210592