Page MenuHomePhabricator

Re-consider ` >/dev/null 2>&1` as output of many cron'd MW maintenance scripts
Open, MediumPublic

Description

Many MW maintenance scripts do >/dev/null 2>&1 for output purposes... Which means we have no idea when they break, without looking at them. See T187053 and T179131

Although it's trivial to run most of them manually to find out why, we should be more proactive about knowing about it...

So, should we make sure all scripts log to a file (then log rotate etc)? And is it worth doing some post processing to look for known error conditions?

Event Timeline

Somewhat related T187101. When an script breaks and that script is critical (ie: deals with private data to comply with the law) someone should get an email. This already happens with cron jobs on Labs scheduled with the jsub/jstart commands. You can add a -M/-m option so you get notified in case the job failed, was aborted, etc. In adittion to keep logs for performance, reviews, etc. I guess that'd be possible in the Wikimedia Puppet. Actually it is a bit sad that the AbuseFilter private data cleaner script broke in 2016 and it was not until a few days ago that we noticed this (cf. T187053#3963588).

17:30 < Hauskatze> I wonder if for the next round of https://phabricator.wikimedia.org/source/operations-puppet/browse/production/modules/mediawiki/manifests/maintenance/purge_abusefilter.pp;cacc0b5224994e40df6c23bc7b8781305061ed57$4 we could actually store the log ?
17:31 < Hauskatze> re-running after fixing it, not running since 2016

17:34 < mutante> Hauskatze: sure, we can just change the command line and replace the /dev/null ..
..
17:36 < mutante> Hauskatze: > /var/log/mediawiki/purge_abusefilter.log please

Change 410072 had a related patch set uploaded (by MarcoAurelio; owner: MarcoAurelio):
[operations/puppet@production] mediawiki: log next run of purge_abusefilter.pp

https://gerrit.wikimedia.org/r/410072

Change 410072 merged by Dzahn:
[operations/puppet@production] mediawiki: log next run of purge_abusefilter.pp

https://gerrit.wikimedia.org/r/410072

fgiunchedi added a subscriber: fgiunchedi.

I'm +1 on logging output from cron scripts, at least stdout whereas stderr might stay if it is relevant. I would suggest logging to syslog though as that avoids various concerns like rotation and it is already centralized and stored for 90 days on the syslog servers.

Other things to consider: use chronic (from devscripts package) or something to the same effect to mail/output from cron scripts only if the script exists non-zero.

Change 410349 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] mediawiki: reduce frequency of purge_abusefilter to weekly

https://gerrit.wikimedia.org/r/410349

Change 410349 merged by Dzahn:
[operations/puppet@production] mediawiki: reduce frequency of purge_abusefilter to weekly

https://gerrit.wikimedia.org/r/410349