Page MenuHomePhabricator

jsub/jstart inconsistency: non-continuous jobs spawns a login bash shell that loads .bash_profile, but continuous jobs doesn't load either .bash_profile or .bashrc
Open, LowPublic

Description

(See T164191 for an unexpected impact)

Setup:

tools.zhuyifei1999-test@tools-bastion-02:~$ cat > env.sh << EOF
> #!/bin/bash
> env
> EOF
tools.zhuyifei1999-test@tools-bastion-02:~$ chmod a+x env.sh
tools.zhuyifei1999-test@tools-bastion-02:~$ echo 'export BASHTYPE=login' > .bash_profile
tools.zhuyifei1999-test@tools-bastion-02:~$ echo 'export BASHTYPE=nonlogin' > .bashrc
tools.zhuyifei1999-test@tools-bastion-02:~$ jsub -once -N env env.sh; sleep 10; grep BASHTYPE env.{out,err}; rm env.{out,err}
Your job 4557281 ("env") has been submitted
env.out:BASHTYPE=login
tools.zhuyifei1999-test@tools-bastion-02:~$ jsub -once -continuous -N env env.sh; sleep 10; grep BASHTYPE env.{out,err}; rm env.{out,err}
Your job 4557293 ("env") has been submitted
tools.zhuyifei1999-test@tools-bastion-02:~$ jstart -N env env.sh; sleep 10; grep BASHTYPE env.{out,err}; rm env.{out,err}
Your job 4557301 ("env") has been submitted
tools.zhuyifei1999-test@tools-bastion-02:~$

Expected: last three lines should all contain BASHTYPE=login, or BASHTYPE=nonlogin, or neither

Event Timeline

tools.zhuyifei1999-test@tools-bastion-02:~$ cat > ps-ux.sh << EOF
> #!/bin/bash
> ps ux
> EOF
tools.zhuyifei1999-test@tools-bastion-02:~$ chmod a+x ps-ux.sh
tools.zhuyifei1999-test@tools-bastion-02:~$ jsub -once -N ps ps-ux.sh; sleep 5; cat ps.{out,err}; rm ps.{out,err}
Your job 4557501 ("ps") has been submitted
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
53383    14983  0.0  0.0   9548  1128 ?        Ss   16:42   0:00 /bin/bash /mnt/nfs/labstore-secondary-tools-project/zhuyifei1999-test/ps-ux.sh
53383    14986  0.0  0.0  11180  1012 ?        R    16:42   0:00 ps ux
tools.zhuyifei1999-test@tools-bastion-02:~$ jstart -once -N ps ps-ux.sh; sleep 5; cat ps.{out,err}; rm ps.{out,err}
Your job 4557502 ("ps") has been submitted
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
53383    14991  0.0  0.0   4452   648 ?        SNs  16:42   0:00 /bin/sh /var/spool/gridengine/execd/tools-exec-1433/job_scripts/4557502
53383    14992  0.0  0.0   9548  1136 ?        SN   16:42   0:00 /bin/bash /mnt/nfs/labstore-secondary-tools-project/zhuyifei1999-test/ps-ux.sh
53383    14993  0.0  0.0  11180  1012 ?        RN   16:42   0:00 ps ux

Aside from queue differences, another difference is that the wrapper is executed with [[https://github.com/wikimedia/labs-toollabs/blob/master/jobutils/bin/jsub#L643|/bin/sh]], but another check should jstart do not load .profile either:

tools.zhuyifei1999-test@tools-bastion-02:~$ echo 'export SHTYPE=login' > .profile
tools.zhuyifei1999-test@tools-bastion-02:~$ jsub -once -N env env.sh; sleep 10; grep SHTYPE env.{out,err}; rm env.{out,err}
Your job 4557712 ("env") has been submitted
env.out:BASHTYPE=login
tools.zhuyifei1999-test@tools-bastion-02:~$ jstart -N env env.sh; sleep 10; grep SHTYPE env.{out,err}; rm env.{out,err}
Your job 4557713 ("env") has been submitted
tools.zhuyifei1999-test@tools-bastion-02:~$
tools.zhuyifei1999-test@tools-bastion-02:~$ cat > environ.py << EOF
> #!/usr/bin/python
> import os
> 
> for k, v in os.environ.items():
>     print k, v
> EOF
tools.zhuyifei1999-test@tools-bastion-02:~$ chmod a+x environ.py 
tools.zhuyifei1999-test@tools-bastion-02:~$ jsub -once -N env environ.py; sleep 10; grep SHTYPE env.{out,err}; rm env.{out,err}
Your job 4557795 ("env") has been submitted
env.out:BASHTYPE login
tools.zhuyifei1999-test@tools-bastion-02:~$

The interest thing is that it is not the executed bash script that loaded the .bash_profile, since it's loaded even if a python script is called instead of a bash script.

qsub(1) contains this:

-noshell
       Available only for qrsh with a command line.

       Do  not  start  the command line given to qrsh in a user's login
       shell, i.e.  execute it without the wrapping shell.

       This option can be used to speed up execution as some  overhead,
       like the shell startup and sourcing the shell resource files, is
       avoided.

       This option can only be used if no shell-specific  command  line
       parsing  is  required. If the command line contains shell syntax
       like environment variable  substitution  or  (back)  quoting,  a
       shell  must  be  started.   In  this case, either do not use the
       -noshell option or include the shell call in the command line.

       Example:
       qrsh echo '$HOSTNAME'
       Alternative call with the -noshell option
       qrsh -noshell /bin/tcsh -f -c 'echo $HOSTNAME'

My guess: a login bash shell is started by grid in order to parse the command line arguments, but the wrapper script for continuous jobs is passed in via stdin and do not need to parse the arguments and therefore do not load a login shell.