Prometheus puppet agent stats are apparently not working on a number of a Toolforge servers.
The source file generated by the script is stale:
root@tools-worker-1020:~# ls -l /var/lib/prometheus/node.d/puppet_agent.prom -rw-r--r-- 1 prometheus prometheus 784 Jun 5 2018 /var/lib/prometheus/node.d/puppet_agent.prom
in a working server:
aborrero@tools-sgebastion-07:~$ sudo ls -l /var/lib/prometheus/node.d/puppet_agent.prom -rw-r--r-- 1 prometheus prometheus 784 Apr 24 16:33 /var/lib/prometheus/node.d/puppet_agent.prom
The crontab entry in a sane server:
aborrero@tools-sgebastion-07:~$ sudo crontab -u prometheus -l # HEADER: This file was autogenerated at 2019-02-08 01:08:05 +0000 by puppet. # HEADER: While it can still be managed manually, it is definitely not recommended. # HEADER: Note particularly that the comments starting with 'Puppet Name' should # HEADER: not be deleted, as doing so could cause duplicate cron jobs. # Puppet Name: prometheus_puppet_agent_stats * * * * * /usr/local/bin/prometheus-puppet-agent-stats --outfile /var/lib/prometheus/node.d/puppet_agent.prom
vs in a non working server (note date in comment seems stale):
root@tools-worker-1020:~# crontab -u prometheus -l # HEADER: This file was autogenerated at 2017-07-17 01:40:56 +0000 by puppet. # HEADER: While it can still be managed manually, it is definitely not recommended. # HEADER: Note particularly that the comments starting with 'Puppet Name' should # HEADER: not be deleted, as doing so could cause duplicate cron jobs. # Puppet Name: prometheus_puppet_agent_stats * * * * * /usr/local/bin/prometheus-puppet-agent-stats --outfile /var/lib/prometheus/node.d/puppet_agent.prom
This may indicate some missing bits in the puppet manifests? Also I'm suspicious that all non working servers are Jessie VMs?