Page MenuHomePhabricator

cronspam from smart-data-dump due to facter bug
Closed, ResolvedPublic


We are getting a lot of cron spam from the "smart-data-dump" cron job.

The script is ./modules/smart/files/smart-data-dump and it has been added in T86552.

Currently and at first, only cp* hosts were affected.

The root cause is a bug in facter which the script calls in:

raw_output = subprocess.check_output(['/usr/bin/facter', '--puppet', '--json', fact_name])

When running facter with -d to get debug output it can be see that it runs ip show route and tries to parse the output:

DEBUG leatherman.execution:93 - executing command: /sbin/ip route show which then throws a lot of warnings like:

[cp3041:~] $ sudo facter --puppet --json raid | grep WARN

WARN  puppetlabs.facter - Could not process routing table entry:
Expected a destination followed by key/value pairs,
got '2620:0:861:107:10:64:48:101 via fe80::1 dev eno1 metric 1024  mtu lock 1450 pref medium'

This is very similar, but not identical, to the upstream bug

In that bug the parsing fails when there is "linkdown" in the ip route output, but that isn't the case for us.

An attempt was made to add -l error to the facter command in order to set the loglevel to error and suppress the warnings to stop the cron spam.

While this worked fine on the cp* servers displaying the issue..after merging it caused even more and new cron spam on non-cp* hosts.

The reason for that were different facter versions. For some reason cp* hosts appear to have facter 3.x while most other hosts have facter 2.x, even when both are on stretch.

In facter 2.x the "-l error" option does not exist which lead to the new spam from these hosts.

So that change was reverted and now we are back to the original state.. cp* hosts are affected but others are not.

Also the cp* hosts show the same warnings on each puppet run on the console.

Event Timeline

Dzahn updated the task description. (Show Details)

Indeed, facter upgrade task is T219803: upgrade facter and puppet across the fleet, cc @jbond FYI.

I'm not sure what's the right answer is, probably ignoring stderr from facter until facter upgrade is complete and then add -l error.

i have added a plaster to the smart-data-dump to stop the spam and will investigate the underlining issues further via T222326

Dzahn triaged this task as Medium priority.May 3 2019, 12:44 AM

Thanks! That stopped the spam, so priority isn't that high now.

jbond claimed this task.

Resolving this and will track the root problem in