Steps to Reproduce:
Go to https://tools.wmflabs.org/scholia
Actual Results:
No response from webservice or after a long time "502 Bad Gateway
openresty/1.15.8.1"
Expected Results:
Webservice should respond and display a webpage
Scholia Toolforge webservice is operated by me and the webservice is restarted with webservice --backend=kubernetes python3.7 restart as usual but now does not restart correctly.
The uwsgi.log reads
*** Operational MODE: preforking *** WSGI app 0 (mountpoint='') ready in 424 seconds on interpreter 0x5574d4daf090 pid: 1 (default app) mounting /data/project/scholia/www/python/src/app.py on /scholia *** Starting uWSGI 2.0.18-debian (64bit) on [Sat Feb 29 08:41:55 2020] *** compiled with version: 8.2.0 on 10 February 2019 02:42:46 os: Linux-4.19.0-8-amd64 #1 SMP Debian 4.19.98-1 (2020-01-26) nodename: scholia-5bbd659585-6nhcn machine: x86_64 clock source: unix pcre jit disabled detected number of CPU cores: 4 current working directory: /data/project/scholia detected binary path: /usr/bin/uwsgi-core chdir() to /data/project/scholia/www/python/src your memory page size is 4096 bytes detected max file descriptor number: 1048576 lock engine: pthread robust mutexes thunder lock: disabled (you can enable it with --thunder-lock) uwsgi socket 0 bound to TCP address :8000 fd 4 Python version: 3.7.3 (default, Dec 20 2019, 18:57:59) [GCC 8.3.0] PEP 405 virtualenv detected: /data/project/scholia/www/python/venv Set PythonHome to /data/project/scholia/www/python/venv *** Python threads support is disabled. You can enable it with --enable-threads *** Python main interpreter initialized at 0x55aecfe07090 your server socket listen backlog is limited to 100 connections your mercy for graceful operations on workers is 60 seconds mapped 364600 bytes (356 KB) for 4 cores *** Operational MODE: preforking ***
Earlier this week, there was a similar problem with the webservice, resulting in an 50x error with a similar uwsgi.ini logging. WMF staff were around at the same time, working on scheduled maintenance, and the day after, I found the webservice to be responsive. Now, a few days later, the problem came back.
The webservice has recently been updated to Python3.7, and it is possible that the problem stems from this upgrade or the transition to the new K8s. The webservice can be started locally with python3.7 runserver.py with no problem.
I suppose it could be a resource problem, specifically memory.
https://tools.wmflabs.org/k8s-status/namespaces/tool-scholia/ claims that it is running.
https://grafana-labs.wikimedia.org/d/toolforge-k8s-namespace-resources/kubernetes-namespace-resources?orgId=1&var-namespace=tool-scholia&refresh=5m I am not familiar with reading Grafana, but I do not see any issue.
I note that another one of my tools, the wembedder Toolforge webservice, seems to have a similar problem.
The issue is also tracked at Scholia issues: https://github.com/fnielsen/scholia/issues/1062