Page MenuHomePhabricator

cronspam from smart-data-dump due to facter bug
Closed, ResolvedPublic

Description

We are getting a lot of cron spam from the "smart-data-dump" cron job.

The script is ./modules/smart/files/smart-data-dump and it has been added in T86552.

Currently and at first, only cp* hosts were affected.

The root cause is a bug in facter which the script calls in:

raw_output = subprocess.check_output(['/usr/bin/facter', '--puppet', '--json', fact_name])

When running facter with -d to get debug output it can be see that it runs ip show route and tries to parse the output:

DEBUG leatherman.execution:93 - executing command: /sbin/ip route show which then throws a lot of warnings like:

[cp3041:~] $ sudo facter --puppet --json raid | grep WARN

WARN  puppetlabs.facter - Could not process routing table entry:
Expected a destination followed by key/value pairs,
got '2620:0:861:107:10:64:48:101 via fe80::1 dev eno1 metric 1024  mtu lock 1450 pref medium'

This is very similar, but not identical, to the upstream bug https://tickets.puppetlabs.com/browse/FACT-1394

In that bug the parsing fails when there is "linkdown" in the ip route output, but that isn't the case for us.

An attempt was made to add -l error to the facter command in order to set the loglevel to error and suppress the warnings to stop the cron spam.

https://gerrit.wikimedia.org/r/c/operations/puppet/+/507634

While this worked fine on the cp* servers displaying the issue..after merging it caused even more and new cron spam on non-cp* hosts.

The reason for that were different facter versions. For some reason cp* hosts appear to have facter 3.x while most other hosts have facter 2.x, even when both are on stretch.

In facter 2.x the "-l error" option does not exist which lead to the new spam from these hosts.

So that change was reverted and now we are back to the original state.. cp* hosts are affected but others are not.

Also the cp* hosts show the same warnings on each puppet run on the console.

Event Timeline

Dzahn updated the task description. (Show Details)

Indeed, facter upgrade task is T219803: upgrade facter and puppet across the fleet, cc @jbond FYI.

I'm not sure what's the right answer is, probably ignoring stderr from facter until facter upgrade is complete and then add -l error.

i have added a plaster to the smart-data-dump to stop the spam and will investigate the underlining issues further via T222326

Dzahn triaged this task as Medium priority.May 3 2019, 12:44 AM

Thanks! That stopped the spam, so priority isn't that high now.

jbond claimed this task.

Resolving this and will track the root problem in https://phabricator.wikimedia.org/T222356