Page MenuHomePhabricator

Toolforge grid: uwsgi in buster fails to load python3 venvs
Closed, ResolvedPublic

Description

Working on the stretch->buster grid migration, I discovered that the uwsgi-based python setup for webservices has some kind of issue, segfaults and prevents webservices from working.

Summary

fails
root@toolsbeta-sgewebgen-10-1:~# /usr/bin/uwsgi --plugin python,python3 --http-socket :34855 --chdir /data/project/automated-toolforge-tests/www/python/src --callable app --manage-script-name --workers 4 --mount /automated-toolforge-tests=/data/project/automated-toolforge-tests/www/python/src/app.py --die-on-term --strict --master --venv /data/project/automated-toolforge-tests/www/python/venv
*** Starting uWSGI 2.0.18-debian (64bit) on [Mon Jan 31 12:43:13 2022] ***
compiled with version: 8.2.0 on 10 February 2019 02:42:46
os: Linux-4.19.0-17-cloud-amd64 #1 SMP Debian 4.19.194-3 (2021-07-18)
nodename: toolsbeta-sgewebgen-10-1
machine: x86_64
clock source: unix
pcre jit disabled
detected number of CPU cores: 4
current working directory: /root
detected binary path: /usr/bin/uwsgi-core
uWSGI running as root, you can use --uid/--gid/--chroot options
*** WARNING: you are running uWSGI as root !!! (use the --uid flag) *** 
chdir() to /data/project/automated-toolforge-tests/www/python/src
your processes number limit is 31854
your memory page size is 4096 bytes
detected max file descriptor number: 1024
lock engine: pthread robust mutexes
thunder lock: disabled (you can enable it with --thunder-lock)
uwsgi socket 0 bound to TCP address :34855 fd 3
uWSGI running as root, you can use --uid/--gid/--chroot options
*** WARNING: you are running uWSGI as root !!! (use the --uid flag) *** 
Python version: 2.7.16 (default, Oct 10 2019, 22:02:15)  [GCC 8.3.0]
Set PythonHome to /data/project/automated-toolforge-tests/www/python/venv
ImportError: No module named site
works!
root@toolsbeta-sgewebgen-10-1:~# /usr/bin/uwsgi --plugin python3 --http-socket :34855 --chdir /data/project/automated-toolforge-tests/www/python/src --callable app --manage-script-name --workers 4 --mount /automated-toolforge-tests=/data/project/automated-toolforge-tests/www/python/src/app.py --die-on-term --strict --master --venv /data/project/automated-toolforge-tests/www/python/venv
*** Starting uWSGI 2.0.18-debian (64bit) on [Mon Jan 31 12:43:40 2022] ***
compiled with version: 8.2.0 on 10 February 2019 02:42:46
os: Linux-4.19.0-17-cloud-amd64 #1 SMP Debian 4.19.194-3 (2021-07-18)
nodename: toolsbeta-sgewebgen-10-1
machine: x86_64
clock source: unix
pcre jit disabled
detected number of CPU cores: 4
current working directory: /root
detected binary path: /usr/bin/uwsgi-core
uWSGI running as root, you can use --uid/--gid/--chroot options
*** WARNING: you are running uWSGI as root !!! (use the --uid flag) *** 
chdir() to /data/project/automated-toolforge-tests/www/python/src
your processes number limit is 31854
your memory page size is 4096 bytes
detected max file descriptor number: 1024
lock engine: pthread robust mutexes
thunder lock: disabled (you can enable it with --thunder-lock)
uwsgi socket 0 bound to TCP address :34855 fd 3
uWSGI running as root, you can use --uid/--gid/--chroot options
*** WARNING: you are running uWSGI as root !!! (use the --uid flag) *** 
Python version: 3.7.3 (default, Jan 22 2021, 20:04:44)  [GCC 8.3.0]
PEP 405 virtualenv detected: /data/project/automated-toolforge-tests/www/python/venv
Set PythonHome to /data/project/automated-toolforge-tests/www/python/venv
*** Python threads support is disabled. You can enable it with --enable-threads ***
Python main interpreter initialized at 0x5615abc2ccb0
uWSGI running as root, you can use --uid/--gid/--chroot options
*** WARNING: you are running uWSGI as root !!! (use the --uid flag) *** 
your server socket listen backlog is limited to 100 connections
your mercy for graceful operations on workers is 60 seconds
mapped 364600 bytes (356 KB) for 4 cores
*** Operational MODE: preforking ***
mounting /data/project/automated-toolforge-tests/www/python/src/app.py on /automated-toolforge-tests
 * Serving Flask app 'uwsgi_file__data_project_automated-toolforge-tests_www_python_src_app' (lazy loading)
 * Environment: production
   WARNING: This is a development server. Do not use it in a production deployment.
   Use a production WSGI server instead.
 * Debug mode: off
 * Running on http://127.0.0.1:5000/ (Press CTRL+C to quit)

Elaboration

The webservice command in the grid results in a backend call to run uwsgi with the tool code. This assumes a python venv exists with the required libraries to run the webservice. The venv is passed to the uwsgi call.

This session reproduces the problem:

user@toolsbeta-sgebastion-05:~$ become automated-toolforge-tests
toolsbeta.automated-toolforge-tests@toolsbeta-sgebastion-05:~$ cat www/python/src/app.py 
from flask import Flask, Response
app = Flask(__name__)
@app.route('/')
def home():
  with open('/etc/debian_version', 'r') as f:
    return Response(f.read(), content_type="text/plain")
app.run()
toolsbeta.automated-toolforge-tests@toolsbeta-sgebastion-05:~$ bash -c -- 'python3 -m venv /www/python/venv  ; source /www/python/venv/bin/activate ; pip install flask'
toolsbeta.automated-toolforge-tests@toolsbeta-sgebastion-05:~$ webservice --backend gridengine --release stretch uwsgi-python start
Starting webservice ......

The webservice never really starts. The uwsgi.log file contains:

toolsbeta.automated-toolforge-tests@toolsbeta-sgebastion-05:~$ cat uwsgi.log
*** Starting uWSGI 2.0.14-debian (64bit) on [Mon Jan 31 12:56:09 2022] ***
compiled with version: 6.3.0 20170516 on 08 May 2019 07:32:58
os: Linux-4.19.0-0.bpo.14-amd64 #1 SMP Debian 4.19.171-2~deb9u1 (2021-02-08)
nodename: toolsbeta-sgewebgen-09-1
machine: x86_64
clock source: unix
pcre jit disabled
detected number of CPU cores: 4
current working directory: /mnt/nfs/nfs-01-toolsbeta-project/automated-toolforge-tests
detected binary path: /usr/bin/uwsgi-core
chdir() to /data/project/automated-toolforge-tests/www/python/src
your processes number limit is 31805
your process address space limit is 4294967296 bytes (4096 MB)
your memory page size is 4096 bytes
detected max file descriptor number: 1024
lock engine: pthread robust mutexes
thunder lock: disabled (you can enable it with --thunder-lock)
uwsgi socket 0 bound to TCP address :45545 fd 3
Python version: 2.7.13 (default, Apr 16 2021, 14:02:03)  [GCC 6.3.0 20170516]
Set PythonHome to /data/project/automated-toolforge-tests/www/python/venv
ImportError: No module named site

note it loads *python 2.7.13*. The webservice commands generates this command line, loading both python versions (2x, 3x) https://gerrit.wikimedia.org/r/plugins/gitiles/operations/software/tools-webservice/+/refs/heads/master/toolsws/wstypes/python.py#20
If the plugin doesn't contain python3 then this doesn't fail.

I honestly wonder how this worked previously.

A possible solution is to add a versioned wstype (uwsgi-python and uwsgi-python3), or perhaps make it load python3 by default.

Event Timeline

aborrero changed the task status from Open to In Progress.Jan 31 2022, 1:21 PM
aborrero triaged this task as Medium priority.
aborrero moved this task from Inbox to Doing on the cloud-services-team (Kanban) board.

something is going on, trying this:

toolsbeta.automated-toolforge-tests@toolsbeta-sgebastion-05:~$ cat uwsgi.ini 
[uwsgi]
plugin = python3
chdir = /data/project/automated-toolforge-tests/www/python/src
venv = /data/project/automated-toolforge-tests/www/python/venv
mount = /automated-toolforge-tests=/data/project/automated-toolforge-tests/www/python/src/app.py
callable = app
import = flask
toolsbeta.automated-toolforge-tests@toolsbeta-sgebastion-05:~$ webservice --backend gridengine --release stretch uwsgi-plain start
toolsbeta.automated-toolforge-tests@toolsbeta-sgebastion-05:~$ tail -20 uwsgi.log 
your memory page size is 4096 bytes
detected max file descriptor number: 1024
lock engine: pthread robust mutexes
thunder lock: disabled (you can enable it with --thunder-lock)
uwsgi socket 0 bound to TCP address :59079 fd 3
Python version: 3.5.3 (default, Nov  4 2021, 15:29:10)  [GCC 6.3.0 20170516]
PEP 405 virtualenv detected: /data/project/automated-toolforge-tests/www/python/venv
Set PythonHome to /data/project/automated-toolforge-tests/www/python/venv
*** Python threads support is disabled. You can enable it with --enable-threads ***
Python main interpreter initialized at 0x55e706d28330
your server socket listen backlog is limited to 100 connections
your mercy for graceful operations on workers is 60 seconds
mapped 363840 bytes (355 KB) for 4 cores
*** Operational MODE: preforking ***
ImportError: No module named 'flask'
mounting /data/project/automated-toolforge-tests/www/python/src/app.py on /automated-toolforge-tests
Traceback (most recent call last):
  File "/data/project/automated-toolforge-tests/www/python/src/app.py", line 1, in <module>
    from flask import Flask, Response
ImportError: No module named 'flask'

However, flask is correctly installed in the venv:

toolsbeta.automated-toolforge-tests@toolsbeta-sgebastion-05:~$ source www/python/venv/bin/activate
toolsbeta.automated-toolforge-tests@toolsbeta-sgebastion-05:~$ pip install flask
Requirement already satisfied: flask in ./www/python/venv/lib/python3.7/site-packages (2.0.2)
Requirement already satisfied: Jinja2>=3.0 in ./www/python/venv/lib/python3.7/site-packages (from flask) (3.0.3)
Requirement already satisfied: click>=7.1.2 in ./www/python/venv/lib/python3.7/site-packages (from flask) (8.0.3)
Requirement already satisfied: Werkzeug>=2.0 in ./www/python/venv/lib/python3.7/site-packages (from flask) (2.0.2)
Requirement already satisfied: itsdangerous>=2.0 in ./www/python/venv/lib/python3.7/site-packages (from flask) (2.0.1)
Requirement already satisfied: MarkupSafe>=2.0 in ./www/python/venv/lib/python3.7/site-packages (from Jinja2>=3.0->flask) (2.0.1)
Requirement already satisfied: importlib-metadata; python_version < "3.8" in ./www/python/venv/lib/python3.7/site-packages (from click>=7.1.2->flask) (4.10.1)
Requirement already satisfied: typing-extensions>=3.6.4; python_version < "3.8" in ./www/python/venv/lib/python3.7/site-packages (from importlib-metadata; python_version < "3.8"->click>=7.1.2->flask) (4.0.1)
Requirement already satisfied: zipp>=0.5 in ./www/python/venv/lib/python3.7/site-packages (from importlib-metadata; python_version < "3.8"->click>=7.1.2->flask) (3.7.0)
toolsbeta.automated-toolforge-tests@toolsbeta-sgebastion-05:~$ python3
Python 3.7.3 (default, Jan 22 2021, 20:04:44) 
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import flask
[.. no problem ..]

Change 758509 had a related patch set uploaded (by Arturo Borrero Gonzalez; author: Arturo Borrero Gonzalez):

[operations/puppet@production] toolforge: automated tests: schedule webgen tool in the correct grid

https://gerrit.wikimedia.org/r/758509

Change 758509 merged by Arturo Borrero Gonzalez:

[operations/puppet@production] toolforge: automated tests: schedule webgen tool in the correct grid

https://gerrit.wikimedia.org/r/758509

Change 758832 had a related patch set uploaded (by Arturo Borrero Gonzalez; author: Arturo Borrero Gonzalez):

[operations/software/tools-webservice@master] toolsws: add uwsgi-python3 webservice type

https://gerrit.wikimedia.org/r/758832

Change 758851 had a related patch set uploaded (by Arturo Borrero Gonzalez; author: Arturo Borrero Gonzalez):

[operations/puppet@production] toolforge: automated tests: use uwsgi-plain webservice to test the generic web grid

https://gerrit.wikimedia.org/r/758851

Change 758851 merged by Arturo Borrero Gonzalez:

[operations/puppet@production] toolforge: automated tests: use uwsgi-plain ws to test the generic web grid

https://gerrit.wikimedia.org/r/758851

Change 758832 abandoned by Arturo Borrero Gonzalez:

[operations/software/tools-webservice@master] toolsws: add uwsgi-python3 webservice type

Reason:

we should work on improving k8s user experience rather than enabling new functions in the grid.

https://gerrit.wikimedia.org/r/758832

Change 758832 abandoned by Arturo Borrero Gonzalez:

[operations/software/tools-webservice@master] toolsws: add uwsgi-python3 webservice type

Reason:

we should work on improving k8s user experience rather than enabling new functions in the grid.

https://gerrit.wikimedia.org/r/758832

This patch likely solves the problem.

A problem that has been with us for years, and that I have no intention to fix today. We should not help our users stay in the grid, we should help them move into k8s.