Page MenuHomePhabricator

Manual page of jsub is unclear regarding what -once means
Closed, ResolvedPublic

Description

When I type on a shell account on tools-sgebastion-07 (or tools-bastion-02) a simple

man jstart

then I read the following:

-once  Only start one job with that name, fail if another is already started or queued (default if invoked as jstart or qcron‐
       sub).
-continuous
       Start a self-restarting job on the continuous queue (default if invoked as jstart).

So, both -once and -continuous seems to be default when I start a job with jstart.

I doubt that. I interpret -once and -continuous in a way that just one of them can be active but not both?

Event Timeline

Wurgl created this task.Feb 27 2019, 10:44 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptFeb 27 2019, 10:44 PM
Reedy renamed this task from Manual page of jsub is has two defaults, but only one can be default to Manual page of jsub has two defaults, but only one can be default.Feb 27 2019, 11:24 PM
Reedy added projects: Tools, Documentation.
JJMC89 edited projects, added Toolforge; removed Tools.Feb 27 2019, 11:38 PM

So, both -once and -continuous seems to be default when I start a job with jstart.

Yes.

Could you explain your reasoning on why you doubt that? Maybe we can clarify the docs a bit.

Wurgl added a comment.Feb 28 2019, 6:51 AM

Well, I started all my jobs (for historical reasons) from crontab with -once. I have rewritten some job which does its work, sleeps some minutes and then runs again. So I do not have to start it over and over again.

Now I was wandering what happens if it crashes, thats why I did read the man-page and saw that option -continous which reads "Start a self-restarting job on the continuous queue (default if invoked as jstart)."

Fine! remove -once, thats all.

Somewhen later I looked if it still runs, how much memory, how much CPU-time, etc. I opened https://tools.wmflabs.org/admin/oge/status and searched for "process_templatedata". There I found my job. Fine. But there was another job "cron-bnwb" (not mine!) where I saw "Continuous / Running" in the column "State". My job had "Task / Running". So I was wondering why that difference and looked again into the man-page.

As it seems, -once is the default, whereas -continous is no default, at least not when using jstart to start jobs.

My job had "Task / Running".

Can't reproduce:

tools.zhuyifei1999-test@tools-sgebastion-08:~$ jsub -N test sleep 2; sleep 1; while ! qstat | awk '{ print $5 }' | grep r > /dev/null; do sleep 1; done; qstat; while ! [[ -z `qstat` ]]; do sleep 1; done
Your job 417524 ("test") has been submitted
job-ID  prior   name       user         state submit/start at     queue                          slots ja-task-ID 
-----------------------------------------------------------------------------------------------------------------
 417524 0.25000 test       tools.zhuyif r     02/28/2019 07:41:52 task@tools-sgeexec-0935.tools.     1        
tools.zhuyifei1999-test@tools-sgebastion-08:~$ jsub -continuous -N test sleep 2; sleep 1; while ! qstat | awk '{ print $5 }' | grep r > /dev/null; do sleep 1; done; qstat; while ! [[ -z `qstat` ]]; do sleep 1; done
Your job 417543 ("test") has been submitted
job-ID  prior   name       user         state submit/start at     queue                          slots ja-task-ID 
-----------------------------------------------------------------------------------------------------------------
 417543 0.25000 test       tools.zhuyif r     02/28/2019 07:42:07 continuous@tools-sgeexec-0905.     1        
tools.zhuyifei1999-test@tools-sgebastion-08:~$ jstart -N test sleep 2; sleep 1; while ! qstat | awk '{ print $5 }' | grep r > /dev/null; do sleep 1; done; qstat; while ! [[ -z `qstat` ]]; do sleep 1; done
Your job 417544 ("test") has been submitted
job-ID  prior   name       user         state submit/start at     queue                          slots ja-task-ID 
-----------------------------------------------------------------------------------------------------------------
 417544 0.25000 test       tools.zhuyif r     02/28/2019 07:42:37 continuous@tools-sgeexec-0905.     1

Are you sure you submitted a new job (qstat does not have the job name listed when you submit the job), and that you used jstart?

I also failed to find the command that submits the job in tools.persondata's crontab.

Wurgl added a comment.EditedFeb 28 2019, 9:16 AM

Sorry, my mistake.

I startet the job with jsub, not with jstart.

With jstart -continous is the default, but that means that in

"-once Only start one job with that name, fail if another is already started or queued (default if invoked as jstart or qcronsub)."

the part "default if invoked as jstart" ist wrong, since continous or not is sure mutually exclusive.

For the missing entry in crontab: My database crashed, and I am porting PHP 5 to PHP 7, so currently it is pretty empty.

since continous or not is sure mutually exclusive.

being 'once' does not mean it's not continuous. once means the job will not be submitted twice if it's already running, and continuous means it'll restart itself after it crashes in some cases.

Wurgl added a comment.Feb 28 2019, 3:31 PM

Hmm … seems to be my error.

Maybe a change from "fail if another is already started or queued" to "fail if another with the same name is already started or queued" clarifies it a bit?

Change 493462 had a related patch set uploaded (by Zhuyifei1999; owner: Zhuyifei1999):
[labs/toollabs@master] jsub: Clarify help & manual regarding -once

https://gerrit.wikimedia.org/r/493462

Change 493462 merged by jenkins-bot:
[labs/toollabs@master] jsub: Clarify help & manual regarding -once

https://gerrit.wikimedia.org/r/493462

Mentioned in SAL (#wikimedia-cloud) [2019-02-28T18:55:14Z] <zhuyifei1999_> start building jobutils 1.36 T217297

How was this building before? Now tests fail...

## ------------------------- ##
## toollabs 1.36 test suite. ##
## ------------------------- ##
  1: Normal call                                     FAILED (testsuite.at:64)
  2: Quiet call                                      FAILED (testsuite.at:68)
  3: -o points to a non-existing file                FAILED (testsuite.at:74)
  4: -o points to a existing file                    FAILED (testsuite.at:84)
  5: -o points to a non-existing file and -umask is used FAILED (testsuite.at:92)
  6: -o points to a existing file and -umask is used FAILED (testsuite.at:102)
  7: -o points to a existing directory               FAILED (testsuite.at:111)
  8: .jsubrc is honoured                             FAILED (testsuite.at:120)
  9: .jsubrc options are overwritten by command line arguments FAILED (testsuite.at:133)
 10: -l is exploded                                  FAILED (testsuite.at:144)
 11: -l h_vmem is processed                          FAILED (testsuite.at:148)
 12: -l largest wins (virtual_free)                  FAILED (testsuite.at:152)
 13: -l largest wins (h_vmem)                        FAILED (testsuite.at:156)
 14: -l largest wins (default)                       FAILED (testsuite.at:160)

If I execute pdebuild as normal user account I get

[Thu Feb 28 19:03:51 2019] Failed to touch '/home/zhuyifei1999/true.out': [Errno 2] No such file or directory: '/home/zhuyifei1999/true.out'

And if I pdebuild as root I get

[Thu Feb 28 19:04:36 2019] Failed to execute '/usr/bin/qsub -hard -j no -e /root/true.err -o /root/true.out -M root@tools.wmflabs.org -N true -hard -l h_vmem=524288k -q task -b yes /bin/true': [Errno 2] No such file or directory: '/usr/bin/qsub'

Change 493493 had a related patch set uploaded (by Zhuyifei1999; owner: Zhuyifei1999):
[labs/toollabs@master] debian/changelog: Add a space to fix trailer line format

https://gerrit.wikimedia.org/r/493493

Change 493493 merged by jenkins-bot:
[labs/toollabs@master] debian/changelog: Add a space to fix trailer line format

https://gerrit.wikimedia.org/r/493493

Mentioned in SAL (#wikimedia-cloud) [2019-02-28T19:36:16Z] <zhuyifei1999_> built with debuild instead T217297

zhuyifei1999 closed this task as Resolved.Feb 28 2019, 7:37 PM
zhuyifei1999 claimed this task.
07:35:31 0 ✓ zhuyifei1999@tools-bastion-02: ~$ man jsub | grep once
       -once  Only start one job with that name, fail if another job with the same name is already started or queued (default  if  invoked
07:35:38 0 ✓ zhuyifei1999@tools-bastion-02: ~$ jsub --help | grep once
  -once         Only start one job, fail if another job with the same name is
zhuyifei1999 renamed this task from Manual page of jsub has two defaults, but only one can be default to Manual page of jsub is unclear regarding what -once means.Feb 28 2019, 7:37 PM