Hi all,
I've got some trouble with a job that is not running (even though SGE thinks it is) and that doesn't seem to be disappearing after running qdel. Because the job is started from a crontab using -once, this is blocking the job from running.
Specifically:
Job 9999704 (gerrit_reviewer_bot) is supposed to be short running (~1 minute, explicitly capped at 1 hour). Yet it has been running for several days:
tools.gerrit-reviewer-bot@tools-sgebastion-07:~$ date Sun Mar 28 13:28:27 UTC 2021 tools.gerrit-reviewer-bot@tools-sgebastion-07:~$ qstat job-ID prior name user state submit/start at queue slots ja-task-ID ----------------------------------------------------------------------------------------------------------------- 9999704 0.25753 gerrit_rev tools.gerrit dr 03/25/2021 17:49:16 task@tools-sgeexec-0920.tools. 1 1606 0.25729 lighttpd-g tools.gerrit r 03/25/2021 18:16:16 webgrid-lighttpd@tools-sgewebg 1
I have tried the following to clear the job:
- qdel 9999704. This did change the job status to d but did not actually stop it
- qdel -f 9999704 - no (additional) effect
- logging into tools-sgeexec-0920 to kill the job -- but nothing seems to be running there.
The job ID makes me wonder if this is some sort of rollover issue.
For now I've changed the job name in crontab so -once is not blocking a new run from starting.