Page MenuHomePhabricator

No Grid-based updates since 2020-05-10, cannot find dependencies
Closed, ResolvedPublic

Description

Integraality runs are invoked either via the webserver, or via a Shell script running on the Grid.

Since 2020-05-10, the Grid runs crash with a failure to find some modules which are installed in the virtual environment [1].

This is likely linked to the change in python runtime − see 9d5e608e074e and the bash history on that day [2].

[1]

Traceback (most recent call last):
  File "integraality/pages_processor.py", line 9, in <module>
    from ww import f
ImportError: No module named 'ww'

[2] extract from bash history:

400  2020-05-09 18:43:22 webservice --backend=kubernetes python3.5 shell
401  2020-05-09 18:47:24 webservice --backend=kubernetes python3.7 shell
402  2020-05-09 18:47:52 webservice --backend=kubernetes python3.5 shell
403  2020-05-09 18:48:35 webservice --backend=kubernetes python3.7 shell
404  2020-05-09 18:48:41 webservice --backend=kubernetes python3.7 shell
405  2020-05-09 18:50:16 webservice --backend=kubernetes python3.5 shell
406  2020-05-09 18:50:37 webservice --backend=kubernetes python3.6 shell
407  2020-05-09 18:50:48 webservice --backend=kubernetes python3.7 shell
408  2020-05-09 18:52:03 webservice --backend=kubernetes python3.5 shell
409  2020-05-09 18:52:16 webservice --backend=kubernetes python3.5 shell
410  2020-05-09 19:40:16 webservice-bootstrap-python
411  2020-05-09 19:40:26 webservice-python-bootstrap
412  2020-05-09 19:40:34 webservice --backend=kubernetes python3.7 shell
413  2020-05-09 19:43:45 webservice restart
414  2020-05-09 19:43:48 webservice stop
415  2020-05-09 19:44:05 webservice --backend=kubernetes python3.7 restart

Event Timeline

$ sudo become integraality
$ ls -1 www/python/venv/bin/ | grep '3.'
easy_install-3.7
pip3.7

This python3.7 venv can only be used from a runtime that has python3.7 installed. In Toolforge, that means you must be operating from inside a python3.7 container on the Kubernetes cluster.

Your options are:

  1. Roll back to python3.5 which is available on both the Kubernetes cluster and the Grid Engine cluster
  2. Replace your crontab jobs running on the Grid Engine with Kubernetes CronJob objects

Your options are:

  1. Roll back to python3.5 which is available on both the Kubernetes cluster and the Grid Engine cluster
  2. Replace your crontab jobs running on the Grid Engine with Kubernetes CronJob objects
  1. Build a second venv using python3.5 just for use on the Grid Engine cluster

So:

  • crontab invokes jsub with the run.sh script (jsub -mem 1000m -once -j y -o /data/project/integraality/logs/update.log -N update /data/project/integraality/integraality/bin/run.sh)
  • run.sh:
    • sources the virtual environment (via /data/project/integraality/www/python//venv/bin/activate)
    • runs the python script (via python integraality/pages_processor.py)

I tried changing to run VIRTUAL_ENV_PATH/bin/python3 integraality/pages_processor.py (as stated in Help:Toolforge/Python#Use_venv_with_scheduled_tasks ) but the same happens.

I tried slapping a pip freeze in run.sh (after sourcing the venv) to see what happens:

+ pip freeze
Traceback (most recent call last):
  File "/data/project/integraality/www/python/venv/bin/pip", line 6, in <module>
    from pip._internal.cli.main import main
ImportError: No module named 'pip'

I tried changing to run VIRTUAL_ENV_PATH/bin/python3 integraality/pages_processor.py (as stated in Help:Toolforge/Python#Use_venv_with_scheduled_tasks ) but the same happens.

That documentation is badly written. It was true in the long ago when there was not a Kubernetes cluster and all venvs were built with the bastion/grid engine Python interpreter. It is not true when you are mixing Kubernetes and Grid Engine. You need to do one of the 3 things from T257942#6305239.

Thanks @bd808 for the investigation and hints!

This python3.7 venv can only be used from a runtime that has python3.7 installed. In Toolforge, that means you must be operating from inside a python3.7 container on the Kubernetes cluster.

Hmm, clearly I misunderstood, but when I read the announcements about py37 and the 2020 k8s cluster, I was not clear to me that this would break my scheduled jobs as well − of course, that was stupid of me because jsub jobs are by definition Grid-Engine based, not k8s-based, but still…

  1. Build a second venv using python3.5 just for use on the Grid Engine cluster

I already went for option 3 as a stopgap measure ; and can confirm that the update ran overnight. I will look into option 2 k8s CronJob objects at a later point.

Mentioned in SAL (#wikimedia-cloud) [2021-04-03T16:44:58Z] <wm-bot> <jeanfred> Deploy latest from Git master: eac4d2c (T257942)

Mentioned in SAL (#wikimedia-cloud) [2021-04-03T21:33:33Z] <wm-bot> <jeanfred> Deploy latest from Git master: eac4d2c (T257942)

JeanFred claimed this task.

This was resolved a while back with the use of a py35 venv ; with 8bd793d and the venv unification, we can fore sure close it.