A few days ago (January 14th), I attempted to move the lexeme-forms toolforge tool from Ubuntu Trusty to Debian Stretch (cf. documentation), by stopping the webservice from the Trusty bastion and then rebuilding the Python venv and starting the webservice from the Stretch bastion. Since the tool uses the Kubernetes backend, I’m not sure if this actually makes a difference, but since then it seems to be plagued by intermittent bursts of very slow response time.
To confirm that this is not just a vague feeling, I checked the timings in the access log with the following command:
sed -E -n 's/^.*\[\w+ (\w+ [0-9]+ [0-9]+:[0-9]+:[0-9]+ [0-9]+)\] [A-Z]+ .* generated [0-9]+ bytes in ([0-9]+) msecs.*$/\2\t\1/p' uwsgi.log
The full output is available at F27930388; a brief check with awk '$1 >= 5000' shows, for example, that there are about as many requests taking over five seconds from January 14th to now as there were from June 2018 (when the tool was first deployed) to January 13th, so it seems clear to me that this is a new problem, not one I just didn’t happen to notice earlier.
Some of these requests are ones where the server has to do some work, including MediaWiki API requests; however, there are also some simple requests, such as the index page or static CSS/JS files (served by Flask’s default static endpoint, which uses WSGI X-Sendfile / Linux sendfile(2)), taking tens of seconds.