Page MenuHomePhabricator

Database-reports can't see packages in its virtualenv on the grid
Closed, ResolvedPublic

Description

Upon being ported to Python 3, the database-reports tool had stopped seeing packages in its virtualenv when run as a job on the compute grid:

Traceback (most recent call last):
  File "/data/project/database-reports/reports/database-reports/main.py", line 1, in <module>
    import mwclient
ImportError: No module named 'mwclient'

Running the same command locally on bastion doesn't produce this error. They're invoked using virtualenv's python: /data/project/database-reports/reports/database-reports/venv/bin/python /data/project/database-reports/reports/database-reports/main.py en forgotten_articles.

Impact

The tools is completely non functional.

Event Timeline

MaxSem created this task.Aug 1 2019, 7:47 AM
Restricted Application added subscribers: Danmichaelo, Aklapper. · View Herald TranscriptAug 1 2019, 7:47 AM
bd808 renamed this task from Database reports can't see packages in its virtualenv on the grid to Database-reports can't see packages in its virtualenv on the grid.Aug 8 2019, 3:25 AM
bd808 added a project: Tools.
ifried added a subscriber: ifried.Aug 8 2019, 10:54 PM

@MaxSem Can you provide some details on the impact of this? Thanks.

MaxSem updated the task description. (Show Details)Aug 13 2019, 9:49 PM
MaxSem updated the task description. (Show Details)
Bstorm added a subscriber: Bstorm.Aug 13 2019, 10:19 PM

The virtualenv is clearly well-formed, but the environment of the grid can be a bit weird, so I know I have to use a shell wrapper to run python to set a few things similar to what is mentioned here: https://wikitech.wikimedia.org/wiki/Help:Toolforge/Grid#An_error_with_%22ascii%22_codepage,_%22file_not_found%22,_or_UnicodeEncodeError

Here is mine from the cdnjs project:

#!/usr/bin/env bash

export LANG="en_US.UTF-8"
export LC_ALL="en_US.UTF-8"
export LC_CTYPE="en_US.UTF-8"
source $HOME/venv/bin/activate
cd $HOME/cdnjs-index
$HOME/venv/bin/python generate.py \
  --token $HOME/cdnjs-index/tokenfile \
  $HOME/public_html/

My script is also python3. I don't know if that will help you here, but it cannot hurt to try since python3 introduces different encoding issues (which doesn't usually but could interfere with the import path).

I will say that I can import that when I run this on an exec node directly, so this isn't a difference in the nodes. It could be a difference in the environment, though, which is what a wrapper might fix.

Running it with empty env doesn't do. env -i /data/project/database-reports/reports/database-reports/venv/bin/python -c 'import mwclient' works fine on bastion.

tools.database-reports@tools-sgebastion-08:~$ env -i /data/project/database-reports/reports/database-reports/venv/bin/python -c 'print(__import__("sys").path);import mwclient'
['', '/usr/lib/python35.zip', '/usr/lib/python3.5', '/usr/lib/python3.5/plat-x86_64-linux-gnu', '/usr/lib/python3.5/lib-dynload', '/data/project/database-reports/reports/database-reports/venv/lib/python3.5/site-packages']
tools.database-reports@tools-sgebastion-08:~$ jsub -N T229551-zhuyifei1999-test /data/project/database-reports/reports/database-reports/venv/bin/python -c "'"'print(__import__("sys").path);import mwclient'"'"
Your job 7133608 ("T229551-zhuyifei1999-test") has been submitted
tools.database-reports@tools-sgebastion-08:~$ cat T229551-zhuyifei1999-test.*
Traceback (most recent call last):
  File "<string>", line 1, in <module>
ImportError: No module named 'mwclient'
['', '/usr/lib/python35.zip', '/usr/lib/python3.5', '/usr/lib/python3.5/plat-x86_64-linux-gnu', '/usr/lib/python3.5/lib-dynload', '/usr/local/lib/python3.5/dist-packages', '/usr/lib/python3/dist-packages']
tools.database-reports@tools-sgebastion-08:~$ jsub -N T229551-zhuyifei1999-test strace /data/project/database-reports/reports/database-reports/venv/bin/python -c "'"'print(__import__("sys").path);import mwclient'"'"
Your job 7133675 ("T229551-zhuyifei1999-test") has been submitted
tools.database-reports@tools-sgebastion-08:~$ cat T229551-zhuyifei1999-test.*
Traceback (most recent call last):
  File "<string>", line 1, in <module>
ImportError: No module named 'mwclient'
execve("/data/project/database-reports/reports/database-reports/venv/bin/python", ["/data/project/database-reports/r"..., "-c", "print(__import__(\"sys\").path);im"...], [/* 49 vars */]) = 0
[...]
write(1, "['', '/usr/lib/python35.zip', '/"..., 222) = 222
[...]
exit_group(0)                           = ?
+++ exited with 0 +++
['', '/usr/lib/python35.zip', '/usr/lib/python3.5', '/usr/lib/python3.5/plat-x86_64-linux-gnu', '/usr/lib/python3.5/lib-dynload', '/usr/local/lib/python3.5/dist-packages', '/usr/lib/python3/dist-packages']
['', '/usr/lib/python35.zip', '/usr/lib/python3.5', '/usr/lib/python3.5/plat-x86_64-linux-gnu', '/usr/lib/python3.5/lib-dynload', '/data/project/database-reports/reports/database-reports/venv/lib/python3.5/site-packages']
tools.database-reports@tools-sgebastion-08:~$ (exec -a 'python' /data/project/database-reports/reports/database-reports/venv/bin/python -c 'print(__import__("sys").path);import mwclient')
['', '/usr/lib/python35.zip', '/usr/lib/python3.5', '/usr/lib/python3.5/plat-x86_64-linux-gnu', '/usr/lib/python3.5/lib-dynload', '/usr/local/lib/python3.5/dist-packages', '/usr/lib/python3/dist-packages']
Traceback (most recent call last):
  File "<string>", line 1, in <module>
ImportError: No module named 'mwclient'
tools.database-reports@tools-sgebastion-08:~$ (exec -a 'python' /data/project/database-reports/reports/database-reports/venv/bin/python -c 'print(__import__("sys").executable)')
/usr/bin/python
tools.database-reports@tools-sgebastion-08:~$ /data/project/database-reports/reports/database-reports/venv/bin/python -c 'print(__import__("sys").executable)'
/data/project/database-reports/reports/database-reports/venv/bin/python

It's argv[0].

Is SGE nuts?

tools.database-reports@tools-sgebastion-08:~$ truncate -s 0 T229551-zhuyifei1999-test.*
tools.database-reports@tools-sgebastion-08:~$ (exec -a 'python' /data/project/database-reports/reports/database-reports/venv/bin/python -c 'print(open("/proc/self/cmdline").read().replace("\0", "\n"))')
python
-c
print(open("/proc/self/cmdline").read().replace("\0", "\n"))

tools.database-reports@tools-sgebastion-08:~$ /data/project/database-reports/reports/database-reports/venv/bin/python -c 'print(open("/proc/self/cmdline").read().replace("\0", "\n"))'
/data/project/database-reports/reports/database-reports/venv/bin/python
-c
print(open("/proc/self/cmdline").read().replace("\0", "\n"))

tools.database-reports@tools-sgebastion-08:~$ (exec -a 'python' /data/project/database-reports/reports/database-reports/venv/bin/python -c 'print(open("/proc/self/cmdline").read().replace("\0", "\n"))')
python
-c
print(open("/proc/self/cmdline").read().replace("\0", "\n"))

tools.database-reports@tools-sgebastion-08:~$ jsub -N T229551-zhuyifei1999-test /data/project/database-reports/reports/database-reports/venv/bin/python -c "'"'print(open("/proc/self/cmdline").read().replace("\0", "\n"))'"'"
Your job 7134525 ("T229551-zhuyifei1999-test") has been submitted
tools.database-reports@tools-sgebastion-08:~$ cat T229551-zhuyifei1999-test.*
tools.database-reports@tools-sgebastion-08:~$ cat T229551-zhuyifei1999-test.*
/usr/bin/python3.5
-c
print(open("/proc/self/cmdline").read().replace("\0", "\n"))
tools.database-reports@tools-sgebastion-08:~$ ls -l /data/project/database-reports/reports/database-reports/venv/bin/python*
lrwxrwxrwx 1 tools.database-reports tools.database-reports  7 Jul 17 02:45 /data/project/database-reports/reports/database-reports/venv/bin/python -> python3
lrwxrwxrwx 1 tools.database-reports tools.database-reports 16 Jul 17 02:45 /data/project/database-reports/reports/database-reports/venv/bin/python3 -> /usr/bin/python3
tools.database-reports@tools-sgebastion-08:~$ mv /data/project/database-reports/reports/database-reports/venv/bin/python /data/project/database-reports/reports/database-reports/venv/bin/python.T229551-zhuyifei1999-test
tools.database-reports@tools-sgebastion-08:~$ cp /usr/bin/python3 /data/project/database-reports/reports/database-reports/venv/bin/python
tools.database-reports@tools-sgebastion-08:~$ truncate -s 0 T229551-zhuyifei1999-test.*; jsub -N T229551-zhuyifei1999-test /data/project/database-reports/reports/database-reports/venv/bin/python -c "'"'print(open("/proc/self/cmdline").read().replace("\0", "\n"))'"'"
Your job 7134605 ("T229551-zhuyifei1999-test") has been submitted
tools.database-reports@tools-sgebastion-08:~$ cat T229551-zhuyifei1999-test.*
/mnt/nfs/labstore-secondary-tools-project/database-reports/reports/database-reports/venv/bin/python
-c
print(open("/proc/self/cmdline").read().replace("\0", "\n"))

tools.database-reports@tools-sgebastion-08:~$ truncate -s 0 T229551-zhuyifei1999-test.*; jsub -N T229551-zhuyifei1999-test /data/project/database-reports/reports/database-reports/venv/bin/python -c "'"'print(__import__("sys").path);import mwclient'"'"
Your job 7134733 ("T229551-zhuyifei1999-test") has been submitted
tools.database-reports@tools-sgebastion-08:~$ cat T229551-zhuyifei1999-test.*
['', '/usr/lib/python35.zip', '/usr/lib/python3.5', '/usr/lib/python3.5/plat-x86_64-linux-gnu', '/usr/lib/python3.5/lib-dynload', '/mnt/nfs/labstore-secondary-tools-project/database-reports/reports/database-reports/venv/lib/python3.5/site-packages']

The root cause: jsub resolves symbolic links for both the-file-to-execute and argv[0]. Python has no way of knowing it must be executed in a venv.

This tells me that you should definitely use a wrapper script

Root cause is the character set @zhuyifei1999

jsub -v LC_ALL=en_US.UTF-8 -N T229551-bstorm-test /data/project/database-reports/reports/database-reports/venv/bin/python -c "'"'print(__import__("sys").path);import mwclient'"'"
tools.database-reports@tools-sgebastion-07:~$ cat T229551-bstorm-test.out
['', '/usr/lib/python35.zip', '/usr/lib/python3.5', '/usr/lib/python3.5/plat-x86_64-linux-gnu', '/usr/lib/python3.5/lib-dynload', '/mnt/nfs/labstore-secondary-tools-project/database-reports/reports/database-reports/venv/lib/python3.5/site-packages']
01:34:55 0 ✓ zhuyifei1999@tools-sgebastion-08: ~$ virtualenv -p python3 T229551-test
Already using interpreter /usr/bin/python3
Using base prefix '/usr'
New python executable in /mnt/nfs/labstore-secondary-tools-home/zhuyifei1999/T229551-test/bin/python3
Also creating executable in /mnt/nfs/labstore-secondary-tools-home/zhuyifei1999/T229551-test/bin/python
Installing setuptools, pkg_resources, pip, wheel...done.
01:35:54 0 ✓ zhuyifei1999@tools-sgebastion-08: ~$ python3 -m venv T229551-test-2
01:36:44 0 ✓ zhuyifei1999@tools-sgebastion-08: ~$ ls -l /mnt/nfs/labstore-secondary-tools-home/zhuyifei1999/T229551-test{,-2}/bin/python*
lrwxrwxrwx 1 zhuyifei1999 wikidev       7 Aug 14 01:36 /mnt/nfs/labstore-secondary-tools-home/zhuyifei1999/T229551-test-2/bin/python -> python3
lrwxrwxrwx 1 zhuyifei1999 wikidev      16 Aug 14 01:36 /mnt/nfs/labstore-secondary-tools-home/zhuyifei1999/T229551-test-2/bin/python3 -> /usr/bin/python3
lrwxrwxrwx 1 zhuyifei1999 wikidev       7 Aug 14 01:35 /mnt/nfs/labstore-secondary-tools-home/zhuyifei1999/T229551-test/bin/python -> python3
-rwxr-xr-x 1 zhuyifei1999 wikidev 4751184 Aug 14 01:35 /mnt/nfs/labstore-secondary-tools-home/zhuyifei1999/T229551-test/bin/python3
lrwxrwxrwx 1 zhuyifei1999 wikidev       7 Aug 14 01:35 /mnt/nfs/labstore-secondary-tools-home/zhuyifei1999/T229551-test/bin/python3.5 -> python3
-rwxr-xr-x 1 zhuyifei1999 wikidev    2382 Aug 14 01:35 /mnt/nfs/labstore-secondary-tools-home/zhuyifei1999/T229551-test/bin/python-config

venvs built with virtualenv command does not symlink to system python3, while python-m venv symlinks. This is probably why this did not affect python 2 venvs.

The character set changes on the grid seem to affect the resolution of the python search path.

Root cause is the character set @zhuyifei1999

See above. I have already changed the env by executing tools.database-reports@tools-sgebastion-08:~$ cp /usr/bin/python3 /data/project/database-reports/reports/database-reports/venv/bin/python

Ah ok. But that doesn't make sense. My venv works fine. It also is a symlink.

@zhuyifei1999 was kind enough to put things back so I could prove myself good and solidly wrong about the character set interfering. It is definitely the resolving of symlinks...and that's why a bash wrapper is a good idea here. Thanks @zhuyifei1999 :)

jstart / jsub -continuous uses an implicit bash wrapper in order to to restart the job when it exits with an error. It has the side effect of keeping not needing those double escapes (T50811), and I thought it would work, but I proved myself wrong:

tools.database-reports@tools-sgebastion-08:~$ truncate -s 0 T229551-zhuyifei1999-test.*; jsub -continuous -N T229551-zhuyifei1999-test /data/project/database-reports/reports/database-reports/venv/bin/python -c 'print(open("/proc/self/cmdline").read().replace("\0", "\n"))'
Your job 7135679 ("T229551-zhuyifei1999-test") has been submitted
tools.database-reports@tools-sgebastion-08:~$ cat T229551-zhuyifei1999-test.*
/usr/bin/python3.5
-c
print(open("/proc/self/cmdline").read().replace("\0", "\n"))

I fixed it!

tools.zhuyifei1999-test@tools-sgebastion-08:~$ truncate -s 0 T229551-zhuyifei1999-test.*; jsub -continuous -N T229551-zhuyifei1999-test /data/project/zhuyifei1999-test/venv/bin/python -c 'print(__import__("sys").path)'
Your job 7135937 ("T229551-zhuyifei1999-test") has been submitted
tools.zhuyifei1999-test@tools-sgebastion-08:~$ cat T229551-zhuyifei1999-test.*
['', '/usr/lib/python35.zip', '/usr/lib/python3.5', '/usr/lib/python3.5/plat-x86_64-linux-gnu', '/usr/lib/python3.5/lib-dynload', '/usr/local/lib/python3.5/dist-packages', '/usr/lib/python3/dist-packages']
tools.zhuyifei1999-test@tools-sgebastion-08:~$ truncate -s 0 T229551-zhuyifei1999-test.*; ./jsub -continuous -N T229551-zhuyifei1999-test /data/project/zhuyifei1999-test/venv/bin/python -c 'print(__import__("sys").path)'
Your job 7136067 ("T229551-zhuyifei1999-test") has been submitted
tools.zhuyifei1999-test@tools-sgebastion-08:~$ cat T229551-zhuyifei1999-test.*
['', '/usr/lib/python35.zip', '/usr/lib/python3.5', '/usr/lib/python3.5/plat-x86_64-linux-gnu', '/usr/lib/python3.5/lib-dynload', '/data/project/zhuyifei1999-test/venv/lib/python3.5/site-packages']
tools.zhuyifei1999-test@tools-sgebastion-08:~$ diff `which jsub` ./jsub -u
--- /usr/bin/jsub	2018-11-29 19:39:44.000000000 +0000
+++ ./jsub	2019-08-14 02:17:49.057655606 +0000
@@ -162,15 +162,15 @@
     """
     # Already a full path?
     if prog[0] == os.sep and os.path.exists(prog):
-        return os.path.realpath(prog)
+        return os.path.normpath(prog)
     if prog[0] != os.curdir:
         # Look in each dir of $PATH
         for path in os.environ.get('PATH', '').split(os.pathsep):
             if os.path.exists(os.path.join(path, prog)):
-                return os.path.realpath(os.path.join(path, prog))
+                return os.path.normpath(os.path.join(path, prog))
     # Not found in $PATH so try looking in $PWD
     if os.path.exists(os.path.join(os.getcwd(), prog)):
-        return os.path.realpath(os.path.join(os.getcwd(), prog))
+        return os.path.normpath(os.path.join(os.getcwd(), prog))
     raise argparse.ArgumentTypeError("Program '%s' not found." % prog)

Change 530020 had a related patch set uploaded (by Zhuyifei1999; owner: Zhuyifei1999):
[labs/toollabs@master] jsub: use normpath instead of realpath to resolve executable path

https://gerrit.wikimedia.org/r/530020

Bstorm added a comment.EditedAug 14 2019, 2:31 AM

In the meantime, I did confirm separately that what I documented about using a wrapper explicitly with an activate does work with jsub. I very much like the idea of having it fixed so that isn't necessary, though :)

I wanted to make sure what I put on wikitech was true, and it is if done just right (activating and then explicitly using that python worked from inside a bash wrapper). Will look into that patch in the morning. It looks like just the thing!

Change 530020 merged by Jhedden:
[labs/toollabs@master] jsub: use normpath instead of realpath to resolve executable path

https://gerrit.wikimedia.org/r/530020

This comment was removed by JHedden.

Mentioned in SAL (#wikimedia-cloud) [2019-08-15T15:32:52Z] <jeh> upgraded jobutils debian package to 1.38 T229551

This seems like it is fixed now. I don't need a bash wrapper in my test case.

Bstorm closed this task as Resolved.Aug 15 2019, 6:08 PM
Bstorm claimed this task.

Closing since I was able to test it with @MaxSem's tool account/venv.