Page MenuHomePhabricator

jstart doesn't signal out-of-memory kills to the user
Closed, DeclinedPublic

Description

When the client script is killed with -9, the wrapper script terminates as well.

While this is probably a prudent choice, the user isn't informed about this at all.

As a minimal courtesy, we should add "-m ae" to the qsub call, so that the user gets at least a mail that he probably doesn't understand :-). Of course, even better would be to use a SGE hook that fires after a job terminates.


Version: unspecified
Severity: normal

Details

Reference
bz50053

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 1:54 AM
bzimport added a project: Toolforge.
bzimport set Reference to bz50053.

Users may request -m from jsub/jstart which are passed to qsub and behave as expected. I'd rather not increase the default amount of cron/gridengin spam.

My suggestion would send one (1) message if a job terminates that the user expects to be running continuously.

Wouldn't the default "Hey, I had to restart your job" from bigbrother fill that function?

That requires the job being managed by bigbrother. I just wanted to point out that IMHO one message per interaction is not spam, especially if it conveys important information for the recipient.