Page MenuHomePhabricator

Toolforge: prometheus puppet agent stat not working
Closed, ResolvedPublic

Description

Prometheus puppet agent stats are apparently not working on a number of a Toolforge servers.

The source file generated by the script is stale:

root@tools-worker-1020:~# ls -l /var/lib/prometheus/node.d/puppet_agent.prom
-rw-r--r-- 1 prometheus prometheus 784 Jun  5  2018 /var/lib/prometheus/node.d/puppet_agent.prom

in a working server:

aborrero@tools-sgebastion-07:~$ sudo ls -l /var/lib/prometheus/node.d/puppet_agent.prom
-rw-r--r-- 1 prometheus prometheus 784 Apr 24 16:33 /var/lib/prometheus/node.d/puppet_agent.prom

The crontab entry in a sane server:

aborrero@tools-sgebastion-07:~$ sudo crontab -u prometheus -l
# HEADER: This file was autogenerated at 2019-02-08 01:08:05 +0000 by puppet.
# HEADER: While it can still be managed manually, it is definitely not recommended.
# HEADER: Note particularly that the comments starting with 'Puppet Name' should
# HEADER: not be deleted, as doing so could cause duplicate cron jobs.
# Puppet Name: prometheus_puppet_agent_stats
* * * * * /usr/local/bin/prometheus-puppet-agent-stats --outfile /var/lib/prometheus/node.d/puppet_agent.prom

vs in a non working server (note date in comment seems stale):

root@tools-worker-1020:~# crontab -u prometheus -l
# HEADER: This file was autogenerated at 2017-07-17 01:40:56 +0000 by puppet.
# HEADER: While it can still be managed manually, it is definitely not recommended.
# HEADER: Note particularly that the comments starting with 'Puppet Name' should
# HEADER: not be deleted, as doing so could cause duplicate cron jobs.
# Puppet Name: prometheus_puppet_agent_stats
* * * * * /usr/local/bin/prometheus-puppet-agent-stats --outfile /var/lib/prometheus/node.d/puppet_agent.prom

This may indicate some missing bits in the puppet manifests? Also I'm suspicious that all non working servers are Jessie VMs?

Event Timeline

aborrero created this task.Apr 24 2019, 4:38 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptApr 24 2019, 4:38 PM

It seems to me the cron job is somehow not running.

aborrero triaged this task as Normal priority.
aborrero moved this task from Inbox to Doing on the cloud-services-team (Kanban) board.

The issue seems to be lack of empty blank line at the end of the crontab file.

Apparently the blank line is not important. We just need a RELOAD of the crontab file.

Mentioned in SAL (#wikimedia-cloud) [2019-04-25T11:43:22Z] <arturo> T221793 removing prometheus crontab and letting puppet agent re-create it again to resolve staleness

aborrero closed this task as Resolved.Apr 25 2019, 12:40 PM