Page MenuHomePhabricator

Migration on to new Kubernetes on Toolforge failed for Scholia
Closed, ResolvedPublicBUG REPORT

Description

There are several problems with toolforge migration for my Scholia tool.

Initially I tried webservice migrate as stated on https://wikitech.wikimedia.org/w/index.php?title=News/2020_Kubernetes_cluster_migration&oldid=1854854#Manually_migrate_a_webservice_to_the_new_cluster

This resulted in a failure where the output of the webservice program was:

DEPRECATED: 'php5.6' type is deprecated.
  See https://wikitech.wikimedia.org/wiki/Help:Toolforge/Kubernetes
  for currently supported types.
Stopping webservice on legacy Kubernetes cluster
Switched to context "toolforge".
Could not find a public_html folder or a .lighttpd.conf file in your tool home.

There is no public_html folder (there is something in ~/www/python). I have no .lighttpd.conf.


It is possible to go back to older documentation at https://wikitech.wikimedia.org/w/index.php?title=News/2020_Kubernetes_cluster_migration&oldid=1851901#Manually_migrate_a_webservice_to_the_new_cluster This allowed me to change to the new system. However, there is a failure.

A restart of the server with the commit now results in another failure with an unclear nature. My commands was

476  2020-02-21 08:27:37 kubectl config use-context toolforge
477  2020-02-21 08:27:46 alias kubectl=/usr/bin/kubectl
478  2020-02-21 08:27:50 echo "alias kubectl=/usr/bin/kubectl" >> $HOME/.profile
479  2020-02-21 08:28:18 webservice --backend=kubernetes python3.7 start
480  2020-02-21 08:28:42 tail -f uwsgi.log

The starting process restarts at the uwsgi.inilog line mounting /data/project/scholia/www/python/src/app.py on /scholia. No further error message seems available.

The repeating parts of uwsgi.init are similar to:

8533 *** Operational MODE: preforking ***
8534 WSGI app 0 (mountpoint='') ready in 31 seconds on interpreter 0x55b7d7326080 pid: 1 (default app)
8535 mounting /data/project/scholia/www/python/src/app.py on /scholia
8536 *** Starting uWSGI 2.0.18-debian (64bit) on [Fri Feb 21 09:13:46 2020] ***
8537 compiled with version: 8.2.0 on 10 February 2019 02:42:46
8538 os: Linux-4.19.0-8-amd64 #1 SMP Debian 4.19.98-1 (2020-01-26)
8539 nodename: scholia-5bbd659585-6b7st
8540 machine: x86_64
8541 clock source: unix
8542 pcre jit disabled
8543 detected number of CPU cores: 4
8544 current working directory: /data/project/scholia
8545 detected binary path: /usr/bin/uwsgi-core
8546 chdir() to /data/project/scholia/www/python/src
8547 your memory page size is 4096 bytes
8548 detected max file descriptor number: 1048576
8549 lock engine: pthread robust mutexes
8550 thunder lock: disabled (you can enable it with --thunder-lock)
8551 uwsgi socket 0 bound to TCP address :8000 fd 4
8552 Python version: 3.7.3 (default, Dec 20 2019, 18:57:59)  [GCC 8.3.0]
8553 PEP 405 virtualenv detected: /data/project/scholia/www/python/venv
8554 Set PythonHome to /data/project/scholia/www/python/venv
8555 *** Python threads support is disabled. You can enable it with --enable-threads ***
8556 Python main interpreter initialized at 0x55f9ab9b1080
8557 your server socket listen backlog is limited to 100 connections
8558 your mercy for graceful operations on workers is 60 seconds
8559 mapped 364600 bytes (356 KB) for 4 cores
8560 *** Operational MODE: preforking ***
8561 WSGI app 0 (mountpoint='') ready in 60 seconds on interpreter 0x55f9ab9b1080 pid: 1 (default app)
8562 mounting /data/project/scholia/www/python/src/app.py on /scholia
8563 *** Starting uWSGI 2.0.18-debian (64bit) on [Fri Feb 21 09:15:16 2020] ***

I imagine it could be something related to changes in the resources, - that the image of Scholia is too large for the Kubernetes resources allocated from Toolforge sides.


(I then tried to migrate back to python2.7, but that also resulted in various errors, which seems to be on my part. I had problems with reestablishing the virtual environment of Python2.7 with import datetime failure, etc. After some struggles, Python2.7-Scholia seems to be running now at https://tools.wmflabs.org/scholia/)

Event Timeline

Fnielsen claimed this task.

I think was due to our tool (import error on lxml on new server) as well as the problem with virtual environment change during migration. It seems to work now. https://tools.wmflabs.org/scholia/