Page MenuHomePhabricator

Broken elasticsearch-prometheus-exporter service on logstash nodes after reboot
Closed, ResolvedPublic

Description

When I rebooted logstash1007-1009 for security updates, they had a failed service for prometheus-elasticsearch-exporter.service when they came back up:

sudo systemctl list-units | grep failed
• prometheus-elasticsearch-exporter.service loaded failed failed    Prometheus exporter for Elasticsearch

The actual exporter instance (-9200) seems to be running fine.

In journalctl it can be seen that this is caused by the missing Environment file:

Nov 28 10:49:52 logstash1009 systemd[1]: Starting Prometheus exporter for Elasticsearch...
Nov 28 10:49:52 logstash1009 systemd[1]: Failed to load environment files: No such file or directory
Nov 28 10:49:52 logstash1009 systemd[1]: prometheus-elasticsearch-exporter.service failed to run 'start' task: No such file or directory

This environment file was explictly removed in puppet:
https://github.com/wikimedia/puppet/blob/production/modules/prometheus/manifests/elasticsearch_exporter/common.pp#L8
The same Puppet class also ensures that the exporter is stopped, but at that point it has already failed (it gets started upon system boot)

Given that the prometheus-elasticsearch-exporter ships a systemd unit we don't use (and which can't even start due to the removal of the environment file) it would be best if prometheus-elasticsearch-exporter were masked in puppet (via systemctl mask prometheus-elasticsearch-exporter.service).

Event Timeline

Restricted Application added a project: Discovery-Search. · View Herald TranscriptNov 28 2018, 11:52 AM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
MoritzMuehlenhoff triaged this task as Normal priority.Nov 28 2018, 11:52 AM

@MoritzMuehlenhoff can you verify that prometheus-elasticsearch-exporter.service no longer fails and is masked?
Thanks!

I can confirm that prometheus-elasticsearch-exporter.service is masked on logstash nodes. I have not rebooted one of the nodes, but I think this is good enough to close this task.

If there are any problem after next reboot, we can always reopen.

debt closed this task as Resolved.Jan 18 2019, 7:10 PM
debt claimed this task.