Page MenuHomePhabricator

Toolforge: prometheus puppet agent stat not working
Closed, ResolvedPublic

Description

Prometheus puppet agent stats are apparently not working on a number of a Toolforge servers.

The source file generated by the script is stale:

root@tools-worker-1020:~# ls -l /var/lib/prometheus/node.d/puppet_agent.prom
-rw-r--r-- 1 prometheus prometheus 784 Jun  5  2018 /var/lib/prometheus/node.d/puppet_agent.prom

in a working server:

aborrero@tools-sgebastion-07:~$ sudo ls -l /var/lib/prometheus/node.d/puppet_agent.prom
-rw-r--r-- 1 prometheus prometheus 784 Apr 24 16:33 /var/lib/prometheus/node.d/puppet_agent.prom

The crontab entry in a sane server:

aborrero@tools-sgebastion-07:~$ sudo crontab -u prometheus -l
# HEADER: This file was autogenerated at 2019-02-08 01:08:05 +0000 by puppet.
# HEADER: While it can still be managed manually, it is definitely not recommended.
# HEADER: Note particularly that the comments starting with 'Puppet Name' should
# HEADER: not be deleted, as doing so could cause duplicate cron jobs.
# Puppet Name: prometheus_puppet_agent_stats
* * * * * /usr/local/bin/prometheus-puppet-agent-stats --outfile /var/lib/prometheus/node.d/puppet_agent.prom

vs in a non working server (note date in comment seems stale):

root@tools-worker-1020:~# crontab -u prometheus -l
# HEADER: This file was autogenerated at 2017-07-17 01:40:56 +0000 by puppet.
# HEADER: While it can still be managed manually, it is definitely not recommended.
# HEADER: Note particularly that the comments starting with 'Puppet Name' should
# HEADER: not be deleted, as doing so could cause duplicate cron jobs.
# Puppet Name: prometheus_puppet_agent_stats
* * * * * /usr/local/bin/prometheus-puppet-agent-stats --outfile /var/lib/prometheus/node.d/puppet_agent.prom

This may indicate some missing bits in the puppet manifests? Also I'm suspicious that all non working servers are Jessie VMs?

Event Timeline

It seems to me the cron job is somehow not running.

aborrero triaged this task as Medium priority.
aborrero moved this task from Inbox to Doing on the cloud-services-team (Kanban) board.

The issue seems to be lack of empty blank line at the end of the crontab file.

Apparently the blank line is not important. We just need a RELOAD of the crontab file.

Mentioned in SAL (#wikimedia-cloud) [2019-04-25T11:43:22Z] <arturo> T221793 removing prometheus crontab and letting puppet agent re-create it again to resolve staleness