Page MenuHomePhabricator

Toolforge crontab command needs to validate output
Closed, ResolvedPublic

Description

Remote crontabs for Tools that are invalid cause issues for the entire host

https://phabricator.wikimedia.org/source/toollabs/browse/master/misctools/oge-crontab

root@tools-bastion-03:~# become zhuyifei1999-test
tools.zhuyifei1999-test@tools-bastion-03:~$ crontab -e
Traceback (most recent call last):
  File "/usr/local/bin/crontab", line 337, in <module>
    main()
  File "/usr/local/bin/crontab", line 280, in main
    crontab_text = crontab.load()
  File "/usr/local/bin/crontab", line 120, in load
    return self._remote('-l')
  File "/usr/local/bin/crontab", line 94, in _remote
    stdoutdata, stderrdata = ssh.communicate(input=stdin)
  File "/usr/lib/python3.4/subprocess.py", line 960, in communicate
    stdout, stderr = self._communicate(input, endtime, timeout)
  File "/usr/lib/python3.4/subprocess.py", line 1659, in _communicate
    self.stdout.encoding)
  File "/usr/lib/python3.4/subprocess.py", line 888, in _translate_newlines
    data = data.decode(encoding)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf3 in position 10: invalid continuation byte

It seems there is some kind of issue with this instance, and the puppet catalog can't be applied here:

tools-cron-01.tools.eqiad.wmflabs: Error: Failed to apply catalog: Could not parse line "PATH=/usr/local/bin:/usr/bin:/bin" at tools.ato:2
clush: tools-cron-01.tools.eqiad.wmflabs: exited with exit code 1

Event Timeline

Mentioned in SAL (#wikimedia-cloud) [2018-01-19T12:56:55Z] <arturo> the puppet status across the fleet seems good, only minor things like T185314 , T179388 and T179386

The file seems to be /var/spool/cron/crontabs/tools.ato in tools-cron-01.tools.eqiad.wmflabs

and this is the content right now:

# DO NOT EDIT THIS FILE - edit the master and reinstall.
# (- installed on Mon Dec  4 17:29:30 2017)
# (Cron version -- $Id: crontab.c,v 2.13 1994/01/17 03:20:37 vixie Exp $)
PATH=/usr/local/bin:/usr/bin:/bin
0 * * * * jsub -l release=trusty -once -N szubcsonk -mem 512m -o /data/project/ato/public_html/log/szubcsonk.txt -j y /data/project/ato/scripts/szubcsonk-core.sh > /dev/null
45 3,7,11,15,19,23 * * * jsub -l release=trusty -once -N frissdump -mem 512m -o /data/project/ato/public_html/log/frissdump.txt -j y /data/project/ato/scripts/frissdump-core.sh > /dev/null
5 0 * * * jsub -l release=trusty -once -N tudakozo -mem 512m -o /data/project/ato/public_html/log/tudakozo.txt -j y /data/project/ato/scripts/tudakozo.sh > /dev/null
15 0 * * * jsub -l release=trusty -once -N articlecandidates -mem 512m -o /data/project/ato/public_html/log/articlecandidates.txt -j y /data/project/ato/scripts/articlecandidates.sh > /dev/null
30 0,1 * * * [ "$(TZ=:Europe/Budapest date +\%H)" = "02" ] && jsub -l release=trusty -once -N huwiki_archive -o /data/project/ato/public_html/log/archive.txt -j y python /shared/pywikipedia/core/pwb.py $HOME/scripts/archivebot_hu.py User:Cherybot/config  > /dev/null
1 22,23 * * * [ "$(TZ=:Europe/Budapest date +\%H)" = "00" ] && jsub -l release=trusty -once -N kezdolap -o /data/project/ato/public_html/log/kezdolap.txt -j y /data/project/ato/scripts/kezdolap.sh  > /dev/null
1 10,11 * * 4 [ "$(TZ=:Europe/Budapest date +\%H)" = "12" ] && jsub -l release=trusty -once -N kezdolap -o /data/project/ato/public_html/log/kezdolap.txt -j y /data/project/ato/scripts/kezdolap.sh  > /dev/null
2 22,23 * * * [ "$(TZ=:Europe/Budapest date +\%H)" = "00" ] && jsub -l release=trusty -once -N evfordulok -o /data/project/ato/public_html/log/evfordulok.txt -j y /data/project/ato/scripts/evfordulok.sh  > /dev/null
45 4 * * * jsub -l release=trusty -once -N sign-of-life -mem 512m -o /data/project/ato/public_html/log/sign-of-life.txt -j y /data/project/ato/scripts/sign-of-life-core.sh > /dev/null
5 4 5,10,15,20,25 * * jsub -l release=trusty -once -N napkepe -mem 512m -o /data/project/ato/public_html/log/napkepe.txt -j y /data/project/ato/scripts/napkepe0-core.sh > /dev/null
10 4 5,10,15,20,25,30 * * jsub -l release=trusty -once -N napkepe -mem 512m -o /data/project/ato/public_html/log/napkepe.txt -j y /data/project/ato/scripts/napkepe1-core.sh > /dev/null
20 4 5,10,15,20,25,30 * * jsub -l release=trusty -once -N napkepe -mem 512m -o /data/project/ato/public_html/log/napkepe.txt -j y /data/project/ato/scripts/napkepe2-core.sh > /dev/null
30 4 5,10,15,20,25,30 * * jsub -l release=trusty -once -N napkepe -mem 512m -o /data/project/ato/public_html/log/napkepe.txt -j y /data/project/ato/scripts/napkepe3-core.sh > /dev/null
15 1 10,26 * * jsub -l release=trusty -once -N figyelmet -mem 512m -o /data/project/ato/public_html/log/figyelmet.txt -j y /data/project/ato/scripts/figyelmet-core.sh > /dev/null
20 2 2,6 * * jsub -l release=trusty -once -N anniversary -mem 512m -o /data/project/ato/public_html/log/anniversary.txt -j y /data/project/ato/scripts/anniversary.sh > /dev/null
#10 22 11 * * jsub -l release=trusty -once -N keres -mem 512m -o /data/project/ato/public_html/log/20160111.txt -j y /data/project/ato/scripts/20160111-keres.sh > /dev/null
#0 4 1-7 * * [ "$(date '+\%a')" == "Mon" ] && jsub -l release=trusty -once -N masvita -mem 512m -o /data/project/ato/public_html/log/masvita.txt -j y /data/project/ato/scripts/pywikibot-compat/masvita.sh > /dev/null
#0 4 15-21 * * ["$(date '+\%a')" == "Mon" ] && jsub -l release=trusty -once -N tataroz -mem 512m -o /data/project/ato/public_html/log/tataroz.txt -j y /data/project/ato/scripts/pywikibot-compat/tataroz.sh > /dev/null
#0 14 * * * jsub -l release=trusty -once -N napkepe-figy -mem 512m -o /data/project/ato/public_html/log/napkepe-figy.txt -j y /data/project/ato/scripts/pywikibot-compat/napkepe-figy.sh > /dev/null
#31 18 4 * * jsub -l release=trusty -N francia-telep -mem 512m -o /data/project/ato/public_html/log/francia-telep.txt -j y /data/project/ato/scripts/1.sh > /dev/null

I can't find this file in our puppet repo, I'm not sure where it is generated from, probably somewhere in /data/project/ato. Hey @bd808 any idea?

I can't find this file in our puppet repo, I'm not sure where it is generated from, probably somewhere in /data/project/ato. Hey @bd808 any idea?

This instance has user crons meant for the grid and I believe taht's what this is. If you do

ssh tools-bastion-03.eqiad.wmflabs
sudo -i
become ato
crontab -e

I believe you'll see this as it will hit a cron handler that edits the remote file on tools-cron-01

I did this ^ and commented out to #PATH=/usr/local/bin:/usr/bin:/bin

edit: we need to notify the users

hmm comenting that out didn't work so I removed the line but it is now tripping over another file

Error: Puppet::Util::FileType::FileTypeCrontab could not read tools.zhuyifei1999-test: invalid byte sequence in UTF-8

Error: Failed to apply catalog: Puppet::Util::FileType::FileTypeCrontab could not read tools.zhuyifei1999-test: invalid byte sequence in UTF-8

I wonder if something changed in how Puppet handles cron w/ P4

tools.zhuyifei1999-test

# DO NOT EDIT THIS FILE - edit the master and reinstall.
# (- installed on Mon Dec  4 18:47:59 2017)
# (Cron version -- $Id: crontab.c,v 2.13 1994/01/17 03:20:37 vixie Exp $)
#perc     �ra       nap      h�nap  h�tnapja

keywords enconding, UTF8 and zhuyifei all pointed me to this patch: https://gerrit.wikimedia.org/r/#/c/383770/

Not sure if related.

Fixed that and now

root@tools-cron-01:/var/spool/cron# puppet agent --test
Warning: Setting configtimeout is deprecated.
   (at /usr/lib/ruby/vendor_ruby/puppet/settings.rb:1146:in `issue_deprecation_warning')
Info: Using configured environment 'future'
Info: Retrieving pluginfacts
Info: Retrieving plugin
Info: Loading facts
Info: Caching catalog for tools-cron-01.tools.eqiad.wmflabs
Notice: /Stage[main]/Base::Environment/Tidy[/var/tmp/core]: Tidying 0 files
Info: Applying configuration version '1516370285'
Notice: /Stage[main]/Base::Monitoring::Host/File[/usr/local/lib/nagios/plugins/check_puppetrun]/content:
--- /usr/local/lib/nagios/plugins/check_puppetrun	2018-01-09 17:12:14.862816311 +0000
+++ /tmp/puppet-file20180119-31858-15pqys2	2018-01-19 13:58:22.102289253 +0000
@@ -112,7 +112,8 @@
 end

 if failcount == :failed
-    puts "CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues"
+    puts "CRITICAL: Catalog fetch fail. " +
+         "Either compilation failed or puppetmaster has issues"
     exit 2
 end

@@ -130,12 +131,15 @@
 human_time_since_last_run = time_ago(time_since_last_run)

 if enabled_only && enabled == false
-    puts "OK: Puppet is currently disabled, not alerting. Last run #{human_time_since_last_run} ago with #{failcount} failures"
+    puts "OK: Puppet is currently disabled, not alerting. " +
+         "Last run #{human_time_since_last_run} ago with #{failcount} failures"
     exit 0
 end

 unless enabled
-    puts "WARNING: Puppet is currently disabled, message: #{disabled_message}, last run #{human_time_since_last_run} ago with #{failcount} failures"
+    puts "WARNING: Puppet is currently disabled, " +
+         "message: #{disabled_message}, " +
+         "last run #{human_time_since_last_run} ago with #{failcount} failures"
     exit 1
 end

@@ -159,5 +163,6 @@
     exit 1
 end

-puts "OK: Puppet is currently enabled, last run #{human_time_since_last_run} ago with #{failcount} failures"
+puts "OK: Puppet is currently enabled, " +
+     "last run #{human_time_since_last_run} ago with #{failcount} failures"
 exit 0

Info: Computing checksum on file /usr/local/lib/nagios/plugins/check_puppetrun
Info: /Stage[main]/Base::Monitoring::Host/File[/usr/local/lib/nagios/plugins/check_puppetrun]: Filebucketed /usr/local/lib/nagios/plugins/check_puppetrun to puppet with sum 2b5059067646ff655caf73d4c4c777f1
Notice: /Stage[main]/Base::Monitoring::Host/File[/usr/local/lib/nagios/plugins/check_puppetrun]/content: content changed '{md5}2b5059067646ff655caf73d4c4c777f1' to '{md5}b51e8c8924ac88b2ad17c921df488d45'
Notice: /Stage[main]/Base::Puppet/Base::Puppet::Config[main]/File[/etc/puppet/puppet.conf.d/10-main.conf]/content:
--- /etc/puppet/puppet.conf.d/10-main.conf	2017-12-13 02:42:41.595856653 +0000
+++ /tmp/puppet-file20180119-31858-1s7uog7	2018-01-19 13:58:23.986323483 +0000
@@ -15,7 +15,8 @@


 daemonize = false
-configtimeout = 960
+http_connect_timeout = 60
+http_read_timeout = 960
 usecacheonfailure = false
 splay = true
 pluginsync = true

Info: Computing checksum on file /etc/puppet/puppet.conf.d/10-main.conf
Info: /Stage[main]/Base::Puppet/Base::Puppet::Config[main]/File[/etc/puppet/puppet.conf.d/10-main.conf]: Filebucketed /etc/puppet/puppet.conf.d/10-main.conf to puppet with sum 16f10c03c66808a33a32341c6afb9e2a
Notice: /Stage[main]/Base::Puppet/Base::Puppet::Config[main]/File[/etc/puppet/puppet.conf.d/10-main.conf]/content: content changed '{md5}16f10c03c66808a33a32341c6afb9e2a' to '{md5}85deab94b7466d66830e739f73773fad'
Info: /Stage[main]/Base::Puppet/Base::Puppet::Config[main]/File[/etc/puppet/puppet.conf.d/10-main.conf]: Scheduling refresh of Exec[delete master certs]
Info: /Stage[main]/Base::Puppet/Base::Puppet::Config[main]/File[/etc/puppet/puppet.conf.d/10-main.conf]: Scheduling refresh of Exec[compile puppet.conf]
Notice: /Stage[main]/Base::Puppet/Exec[delete master certs]: Triggered 'refresh' from 1 events
Notice: /Stage[main]/Base::Puppet/Exec[compile puppet.conf]: Triggered 'refresh' from 1 events
Info: Computing checksum on file /data/project/.system/crontabs/tools-cron-01.tools.eqiad.wmflabs/tools.ato
Info: /Stage[main]/Toollabs::Cronrunner/File[/data/project/.system/crontabs/tools-cron-01.tools.eqiad.wmflabs/tools.ato]: Filebucketed /data/project/.system/crontabs/tools-cron-01.tools.eqiad.wmflabs/tools.ato to puppet with sum bbe2a505149e7c47b62003c88af57310
Notice: /Stage[main]/Toollabs::Cronrunner/File[/data/project/.system/crontabs/tools-cron-01.tools.eqiad.wmflabs/tools.ato]/content: content changed '{md5}bbe2a505149e7c47b62003c88af57310' to '{md5}5539176446336a34967fa3009df46ead'
Info: Computing checksum on file /data/project/.system/crontabs/tools-cron-01.tools.eqiad.wmflabs/tools.eranbot
Info: /Stage[main]/Toollabs::Cronrunner/File[/data/project/.system/crontabs/tools-cron-01.tools.eqiad.wmflabs/tools.eranbot]: Filebucketed /data/project/.system/crontabs/tools-cron-01.tools.eqiad.wmflabs/tools.eranbot to puppet with sum c9ead4dc7b673f03780c435c139298b6
Notice: /Stage[main]/Toollabs::Cronrunner/File[/data/project/.system/crontabs/tools-cron-01.tools.eqiad.wmflabs/tools.eranbot]/content: content changed '{md5}c9ead4dc7b673f03780c435c139298b6' to '{md5}f215029b60378cce9b073f81ae3b4d90'
Info: Computing checksum on file /data/project/.system/crontabs/tools-cron-01.tools.eqiad.wmflabs/tools.zhuyifei1999-test
Info: /Stage[main]/Toollabs::Cronrunner/File[/data/project/.system/crontabs/tools-cron-01.tools.eqiad.wmflabs/tools.zhuyifei1999-test]: Filebucketed /data/project/.system/crontabs/tools-cron-01.tools.eqiad.wmflabs/tools.zhuyifei1999-test to puppet with sum 9fb722a871445ce1cf305df491cbcff3
Notice: /Stage[main]/Toollabs::Cronrunner/File[/data/project/.system/crontabs/tools-cron-01.tools.eqiad.wmflabs/tools.zhuyifei1999-test]/content: content changed '{md5}9fb722a871445ce1cf305df491cbcff3' to '{md5}14171796d9d2d006de949f66a848d187'
Notice: Applied catalog in 31.14 seconds
root@tools-cron-01:/var/spool/cron#

It seems we need more validation for our cron editing script because invalid cron files trip up the entire host. I think this host has been failing to run Puppet for awhile.

chasemp renamed this task from tools-cron-01.tools.eqiad.wmflabs: failed to apply catalog to Toolforge crontab command needs to validate output.Jan 19 2018, 2:00 PM
chasemp triaged this task as Medium priority.
chasemp updated the task description. (Show Details)

Change 405325 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[labs/toollabs@master] d/changelog: refresh entry for relase

https://gerrit.wikimedia.org/r/405325

Change 405325 merged by jenkins-bot:
[labs/toollabs@master] d/changelog: refresh entry for release

https://gerrit.wikimedia.org/r/405325

Mentioned in SAL (#wikimedia-cloud) [2018-01-19T17:32:20Z] <arturo> T185314 deploying new version of jobutils 1.27

Error: Failed to apply catalog: Puppet::Util::FileType::FileTypeCrontab could not read tools.zhuyifei1999-test: invalid byte sequence in UTF-8

Why does puppet have to know about the contents of my crontab or any tool's? It should neither have its behavior depend them, nor alter them, should it?

Also I do not understand why puppet has to assume an encoding of utf-8 in the crontab. All crontab control characters are ascii, and utf-8 is an ascii-safe encoding that "7-bit bytes (bytes where the most significant bit is 0) never appear in a multi-byte sequence", meaning that any valid utf-8 text will not contain false-positive control characters; though if a user choose to use other encodings, it is the user's responsibility to make sure their crontab is processed correctly. After control characters, the rest of the crontab arguments shall be passed into execve(2), which accepts binary and will not assume any encoding. Force decoding as utf-8 gives no benefit besides a possible size decrease when processing CJK characters; downsides are many: failure due to a user choosing to use other encodings, maintainability, and redundant encode/decodes. This is also the reason for https://gerrit.wikimedia.org/r/#/c/394923/ to use ascii+surrogates rather than utf-8.

I'm not saying utf-8 is bad, but for transparent processing, such as in the case of reading / executing crontabs, a program do do the bare minimum of processing to know what it needs to know. Dealing with text, in this case the crontab arguments, should be given to the end user, who may have need for non-utf8-data.

However, if puppet requires utf-8 for a very good reason and/or that we cannot make puppet ignore the encodings of crontabs of tool accounts, we should better enforce it in the crontab command to not break puppet (this has happened to ato's tool for not-perfectly-understood reasons in the linked ticket). Some editors may not enforce utf-8 encoding.

Change 405734 had a related patch set uploaded (by Zhuyifei1999; owner: Zhuyifei1999):
[labs/toollabs@master] crontab: Make the input file is binary when it is sys.stdin

https://gerrit.wikimedia.org/r/405734

Change 405734 merged by jenkins-bot:
[labs/toollabs@master] crontab: Make the input file is binary when it is sys.stdin

https://gerrit.wikimedia.org/r/405734

zhuyifei1999 claimed this task.

The (continued) puppet-pometheus-crontab issue is tracked in T179386. The script should not spew out unicode errors anymore.

Today, again:

aborrero@tools-cron-01:~$ sudo puppet agent -t -v
Info: Using configured environment 'future'
Info: Retrieving pluginfacts
Info: Retrieving plugin
Info: Loading facts
Info: Caching catalog for tools-cron-01.tools.eqiad.wmflabs
Notice: /Stage[main]/Base::Environment/Tidy[/var/tmp/core]: Tidying 0 files
Info: Applying configuration version '1518174468'
Error: Failed to apply catalog: Could not parse line "PATH=/usr/local/bin:/usr/bin:/bin" at tools.ato:2

However:

aborrero@tools-cron-01:~$ dpkg -s misctools | grep Version
Version: 1.29

I did the steps that @chasemp did before:

aborrero@tools-bastion-03:~$ sudo -i
root@tools-bastion-03:~# become ato
tools.ato@tools-bastion-03:~$ crontab -e

I commented the offending line and now puppet is back into normal status:

aborrero@tools-cron-01:~$ sudo puppet agent -t -v
Info: Using configured environment 'future'
Info: Retrieving pluginfacts
Info: Retrieving plugin
Info: Loading facts
Info: Caching catalog for tools-cron-01.tools.eqiad.wmflabs
Notice: /Stage[main]/Base::Environment/Tidy[/var/tmp/core]: Tidying 0 files
Info: Applying configuration version '1518176064'
Notice: /Stage[main]/Standard::Ntp::Client/Ntp::Daemon[client]/File[ntp.conf]/content: 
--- /etc/ntp.conf	2018-02-08 16:12:43.536668404 +0000
+++ /tmp/puppet-file20180209-31661-iif7sq	2018-02-09 11:34:46.590897183 +0000
@@ -5,13 +5,13 @@
 
 ### Servers and Peers: #########################################
 
-server chromium.wikimedia.org
+server chromium.wikimedia.org iburst
 restrict chromium.wikimedia.org notrap nomodify noquery
-server hydrogen.wikimedia.org
+server hydrogen.wikimedia.org iburst
 restrict hydrogen.wikimedia.org notrap nomodify noquery
-server acamar.wikimedia.org
+server acamar.wikimedia.org iburst
 restrict acamar.wikimedia.org notrap nomodify noquery
-server achernar.wikimedia.org
+server achernar.wikimedia.org iburst
 restrict achernar.wikimedia.org notrap nomodify noquery
 
 ### Non-Peer Access Control: ###################################
@@ -44,6 +44,7 @@
 filegen peerstats file peerstats type day enable
 filegen clockstats file clockstats type day enable
 
+
 ### Labs: ######################################################
 
 # On Labs the default "interface" configuration leads to ntpd

Info: Computing checksum on file /etc/ntp.conf
Info: /Stage[main]/Standard::Ntp::Client/Ntp::Daemon[client]/File[ntp.conf]: Filebucketed /etc/ntp.conf to puppet with sum a1750af2959e95ac17a9fbc5c24b5b29
Notice: /Stage[main]/Standard::Ntp::Client/Ntp::Daemon[client]/File[ntp.conf]/content: content changed '{md5}a1750af2959e95ac17a9fbc5c24b5b29' to '{md5}67505b22f4b2d2a196a2ffbb785c542b'
Info: /Stage[main]/Standard::Ntp::Client/Ntp::Daemon[client]/File[ntp.conf]: Scheduling refresh of Service[ntp]
Notice: /Stage[main]/Standard::Ntp::Client/Ntp::Daemon[client]/Service[ntp]: Triggered 'refresh' from 1 events
Notice: /Stage[main]/Toollabs::Exec_environ/Package[mono-vbnc]/ensure: created
Notice: /Stage[main]/Toollabs::Exec_environ/Package[php5-xhprof]/ensure: created
Notice: /Stage[main]/Packages::Python_tk/Package[python-tk]/ensure: created
Notice: /Stage[main]/Packages::Python3_tk/Package[python3-tk]/ensure: created
Info: Computing checksum on file /data/project/.system/crontabs/tools-cron-01.tools.eqiad.wmflabs/tools.ato
Info: /Stage[main]/Toollabs::Cronrunner/File[/data/project/.system/crontabs/tools-cron-01.tools.eqiad.wmflabs/tools.ato]: Filebucketed /data/project/.system/crontabs/tools-cron-01.tools.eqiad.wmflabs/tools.ato to puppet with sum 5539176446336a34967fa3009df46ead
Notice: /Stage[main]/Toollabs::Cronrunner/File[/data/project/.system/crontabs/tools-cron-01.tools.eqiad.wmflabs/tools.ato]/content: content changed '{md5}5539176446336a34967fa3009df46ead' to '{md5}4f606d00b7776794a7dccae0a72e4b0e'
Info: Computing checksum on file /data/project/.system/crontabs/tools-cron-01.tools.eqiad.wmflabs/tools.cewbot
Info: /Stage[main]/Toollabs::Cronrunner/File[/data/project/.system/crontabs/tools-cron-01.tools.eqiad.wmflabs/tools.cewbot]: Filebucketed /data/project/.system/crontabs/tools-cron-01.tools.eqiad.wmflabs/tools.cewbot to puppet with sum 69fcda877d44d415d5386a77435381cc
Notice: /Stage[main]/Toollabs::Cronrunner/File[/data/project/.system/crontabs/tools-cron-01.tools.eqiad.wmflabs/tools.cewbot]/content: content changed '{md5}69fcda877d44d415d5386a77435381cc' to '{md5}fbdfd72dd697a45ac6d98d7095d0ff0b'
Info: Computing checksum on file /data/project/.system/crontabs/tools-cron-01.tools.eqiad.wmflabs/tools.fist
Info: /Stage[main]/Toollabs::Cronrunner/File[/data/project/.system/crontabs/tools-cron-01.tools.eqiad.wmflabs/tools.fist]: Filebucketed /data/project/.system/crontabs/tools-cron-01.tools.eqiad.wmflabs/tools.fist to puppet with sum e8c6d277e3c0cbce473ef1c66923d337
Notice: /Stage[main]/Toollabs::Cronrunner/File[/data/project/.system/crontabs/tools-cron-01.tools.eqiad.wmflabs/tools.fist]/content: content changed '{md5}e8c6d277e3c0cbce473ef1c66923d337' to '{md5}2533b1f9f063e8dac77d1d2a0c908f7d'
Info: Computing checksum on file /data/project/.system/crontabs/tools-cron-01.tools.eqiad.wmflabs/tools.jarbot-iii
Info: /Stage[main]/Toollabs::Cronrunner/File[/data/project/.system/crontabs/tools-cron-01.tools.eqiad.wmflabs/tools.jarbot-iii]: Filebucketed /data/project/.system/crontabs/tools-cron-01.tools.eqiad.wmflabs/tools.jarbot-iii to puppet with sum ef0008d9329c042867a896c1ebc1d212
Notice: /Stage[main]/Toollabs::Cronrunner/File[/data/project/.system/crontabs/tools-cron-01.tools.eqiad.wmflabs/tools.jarbot-iii]/content: content changed '{md5}ef0008d9329c042867a896c1ebc1d212' to '{md5}9dc92892c02ae435d6f4c642e29818d2'
Notice: Applied catalog in 57.12 seconds

The environment variable setting is perfectly valid crontab and is supposed to work (the regex in puppet, source of error message), but somehow it isn't. No idea why.

I re-enabled that line and ran puppet agent --test --verbose --debug. Not much useful information in the output unfortunately :(

[...]
Debug: Caching connection for https://tools-puppetmaster-01.tools.eqiad.wmflabs:8140
Debug: Prefetching crontab resources for cron
Debug: looking for crontabs in /var/spool/cron/crontabs
Debug: Storing state
Debug: Stored state in 0.71 seconds
Error: Failed to apply catalog: Could not parse line "PATH=/usr/local/bin:/usr/bin:/bin" at tools.ato:2
Debug: Dynamically-bound server lookup failed, falling back to report_server setting
[...]

Actually, this is interesting:

tools.ato@tools-bastion-05:~$ crontab -l | xxd | head
0000000: efbb bf50 4154 483d 2f75 7372 2f6c 6f63  ...PATH=/usr/loc
0000010: 616c 2f62 696e 3a2f 7573 722f 6269 6e3a  al/bin:/usr/bin:
0000020: 2f62 696e 0a30 202a 202a 202a 202a 206a  /bin.0 * * * * j
0000030: 7375 6220 2d6c 2072 656c 6561 7365 3d74  sub -l release=t
0000040: 7275 7374 7920 2d6f 6e63 6520 2d4e 2073  rusty -once -N s
0000050: 7a75 6263 736f 6e6b 202d 6d65 6d20 3531  zubcsonk -mem 51
0000060: 326d 202d 6f20 2f64 6174 612f 7072 6f6a  2m -o /data/proj
0000070: 6563 742f 6174 6f2f 7075 626c 6963 5f68  ect/ato/public_h
0000080: 746d 6c2f 6c6f 672f 737a 7562 6373 6f6e  tml/log/szubcson
0000090: 6b2e 7478 7420 2d6a 2079 202f 6461 7461  k.txt -j y /data
>>> '\xef\xbb\xbf'.decode('utf-8')
u'\ufeff'

U+FEFF. BOM!

I backed up the crontab with crontab -l > crontab-T185314-backup, and had the BOM stripped:

tools.ato@tools-bastion-05:~$ crontab -l | tail -c +4 | xxd | head
0000000: 5041 5448 3d2f 7573 722f 6c6f 6361 6c2f  PATH=/usr/local/
0000010: 6269 6e3a 2f75 7372 2f62 696e 3a2f 6269  bin:/usr/bin:/bi
0000020: 6e0a 3020 2a20 2a20 2a20 2a20 6a73 7562  n.0 * * * * jsub
0000030: 202d 6c20 7265 6c65 6173 653d 7472 7573   -l release=trus
0000040: 7479 202d 6f6e 6365 202d 4e20 737a 7562  ty -once -N szub
0000050: 6373 6f6e 6b20 2d6d 656d 2035 3132 6d20  csonk -mem 512m 
0000060: 2d6f 202f 6461 7461 2f70 726f 6a65 6374  -o /data/project
0000070: 2f61 746f 2f70 7562 6c69 635f 6874 6d6c  /ato/public_html
0000080: 2f6c 6f67 2f73 7a75 6263 736f 6e6b 2e74  /log/szubcsonk.t
0000090: 7874 202d 6a20 7920 2f64 6174 612f 7072  xt -j y /data/pr
tools.ato@tools-bastion-05:~$ crontab -l | tail -c +4 | crontab

We can make the crontab command strip any BOMs found, but is there a way to stop puppet from failing from tool crontabs?

We can make the crontab command strip any BOMs found, but is there a way to stop puppet from failing from tool crontabs?

The BOM disappeared when I commented the line? Could be a thing of the text editor being used?

If puppet is validating the crontab internally, a workaround would be to modify the puppet code that handles this? I'm not sure, but it sounds ugly to me.
So, modifying the crontab command as you suggested could work by now.

I ran another puppet run:

[...]
Debug: Caching connection for https://tools-puppetmaster-01.tools.eqiad.wmflabs:8140
Debug: Executing: 'diff -u /var/lib/puppet/lib/puppet/parser/functions/get_clusters.rb /tmp/puppet-file20180209-17025-1gxah7s'
Notice: /File[/var/lib/puppet/lib/puppet/parser/functions/get_clusters.rb]/content: 
--- /var/lib/puppet/lib/puppet/parser/functions/get_clusters.rb	2017-11-15 15:39:41.957269456 +0000
+++ /tmp/puppet-file20180209-17025-1gxah7s	2018-02-09 17:06:50.756084740 +0000
@@ -24,7 +24,7 @@
   newfunction(:get_clusters, :type => :rvalue) do |args|
     all = {}
     # Ganglia config is the source of truth about clusters/site
-    cluster_config = call_function(:hiera, ['ganglia_clusters', {}])
+    cluster_config = call_function(:hiera, ['wikimedia_clusters', {}])
 
     # Arguments are an hash of selectors
     selector ||= {}

Debug: Using cached connection for https://tools-puppetmaster-01.tools.eqiad.wmflabs:8140
Debug: Caching connection for https://tools-puppetmaster-01.tools.eqiad.wmflabs:8140
Notice: /File[/var/lib/puppet/lib/puppet/parser/functions/get_clusters.rb]/content: content changed '{md5}f5f3a1af2f4b8f16fc55f22f3516f355' to '{md5}f72da1632577c875bc9d4e5bbebcf29e'
Debug: /File[/var/lib/puppet/lib/puppet/parser/functions/get_clusters.rb]: The container /var/lib/puppet/lib will propagate my refresh event
Debug: Finishing transaction 35770560
Debug: Loading external facts from /var/lib/puppet/facts.d
Info: Loading facts
[...]
Debug: file_metadata supports formats: pson yaml binary
Debug: Using cached connection for https://tools-puppetmaster-01.tools.eqiad.wmflabs:8140
Debug: Caching connection for https://tools-puppetmaster-01.tools.eqiad.wmflabs:8140
Debug: Prefetching crontab resources for cron
Debug: looking for crontabs in /var/spool/cron/crontabs
Debug: Dynamically-bound server lookup failed, falling back to server setting
Debug: Dynamically-bound port lookup failed; falling back to masterport setting
Debug: Failed to load library 'msgpack' for feature 'msgpack'
[...]
Debug: Caching connection for https://tools-puppetmaster-01.tools.eqiad.wmflabs:8140
Info: Computing checksum on file /data/project/.system/crontabs/tools-cron-01.tools.eqiad.wmflabs/tools.ato
Info: /Stage[main]/Toollabs::Cronrunner/File[/data/project/.system/crontabs/tools-cron-01.tools.eqiad.wmflabs/tools.ato]: Filebucketed /data/project/.system/crontabs/tools-cron-01.tools.eqiad.wmflabs/tools.ato to puppet with sum 4f606d00b7776794a7dccae0a72e4b0e
Notice: /Stage[main]/Toollabs::Cronrunner/File[/data/project/.system/crontabs/tools-cron-01.tools.eqiad.wmflabs/tools.ato]/content: content changed '{md5}4f606d00b7776794a7dccae0a72e4b0e' to '{md5}4bbff9262fcd31461d7c4653fcd41c29'
Debug: /Stage[main]/Toollabs::Cronrunner/File[/data/project/.system/crontabs/tools-cron-01.tools.eqiad.wmflabs/tools.ato]: The container /data/project/.system/crontabs/tools-cron-01.tools.eqiad.wmflabs will propagate my refresh event
Debug: /data/project/.system/crontabs/tools-cron-01.tools.eqiad.wmflabs: The container Class[Toollabs::Cronrunner] will propagate my refresh event
Debug: Dynamically-bound server lookup failed, falling back to server setting
[...]

Puppet succeeded in running, but it seemed to have modified the crontab (unless I'm understanding it wrong?), but nether diff nor md5sum can spot the difference:

tools.ato@tools-bastion-05:~$ md5sum <(tail -c +4 crontab-T185314-backup) <(crontab -l)
bc9ba95079baf7cc18d1d487532ad029  /dev/fd/63
bc9ba95079baf7cc18d1d487532ad029  /dev/fd/62

EDIT: It's irrelevant,

Sorry guys. I was it again yesterday with my f* crontab file what I updated but I can not find and kill the unicode characters in it.
Could someone help me in my actual crontab file below, please?
https://tools.wmflabs.org/ato/scripts/crontab.v3.05.txt

If it works I will delete and hopefully forget my previous versions.

Sorry guys. I was it again yesterday with my f* crontab file what I updated but I can not find and kill the unicode characters in it.
Could someone help me in my actual crontab file below, please?
https://tools.wmflabs.org/ato/scripts/crontab.v3.05.txt

If it works I will delete and hopefully forget my previous versions.

Yeah, you have BOM inside:

zhuyifei1999@zhuyifei1999-ThinkPad-X260:~$ curl -s https://tools.wmflabs.org/ato/scripts/crontab.v3.05.txt | python -c 'print(repr(__import__("sys").stdin.read().decode("utf-8")))' | grep -oP '\\u....' -o
\ufeff

@zhuyifei1999: Sorry, your codes above are chinese to me.
I made a small change in my crontab file with vm editor. It was the following:
:set nobomb
Please, someone check my latest crontab file if it really BOM-less now:
https://tools.wmflabs.org/ato/scripts/crontab.v3.06.txt
I try to avoid any further troubles.

Please, someone check my latest crontab file if it really BOM-less now:
https://tools.wmflabs.org/ato/scripts/crontab.v3.06.txt
I try to avoid any further troubles.

LGTM.

Thanks @zhuyifei1999 and @Ato_01. Closing this task now.