The hourly jobs that update https://tools.wmflabs.org/snapshots/ seem to get killed before they finish quite often.
I find that once a month or so, it gets killed, and after that point all next ones also die before they finish, which means the snapshots don't update any more.
I then run the script manually (once) from tools-login, and after that the hourly updates work again.
0 * * * * /usr/bin/jsub -N snapshots-updateSnaphots -once -quiet -release trusty -mem 2048m ~/update.sh
php /data/project/snapshots/src/mwSnapshots/scripts/updateSnaphots.php > /data/project/snapshots/src/mwSnapshots/logs/updateSnaphots.log 2>&1
Whenever I find that snapshots are stale, I see that the log file is incomplete for the last run, and that there are no running or queued jobs in qstat. Then, when the crontab triggers (or I run jsub manually) I can reproduce the same thing. The job will start, the log file starts, and then after some minutes, the job stops.
I'm aware of qacct but I have been unable to get a response to any of the queries I send it. I'm unable to get a result for recent jobs by name. And after submitting a new one and trying to query it by job ID, I still don't get a response. The command just hangs indefinitely.
Tried:
$ qacct -d 1 -j snapshots-updateSnaphots $ qacct -j 2495263 $ qacct -j 2495263 -o tools.snapshots
Using https://tools.wmflabs.org/?status to track the job manually I find that on that page it also appears, and then disappears.