The QuickCategories tool was apparently stopped for unknown reasons yesterday:
… *** Operational MODE: preforking *** mounting /data/project/quickcategories/www/python/src/app.py on /quickcategories WSGI app 0 (mountpoint='/quickcategories') ready in 33 seconds on interpreter 0x1c91300 pid: 1 (default app) *** uWSGI is running in multiple interpreter mode *** spawned uWSGI master process (pid: 1) spawned uWSGI worker 1 (pid: 8, cores: 1) spawned uWSGI worker 2 (pid: 9, cores: 1) spawned uWSGI worker 3 (pid: 10, cores: 1) spawned uWSGI worker 4 (pid: 11, cores: 1) SIGINT/SIGQUIT received...killing workers... worker 1 buried after 1 seconds worker 2 buried after 1 seconds worker 3 buried after 1 seconds worker 4 buried after 1 seconds goodbye to uWSGI.
(Yes, there were no requests served since the last restart. That’s not a bug, the tool just isn’t overly popular yet.) The log isn’t timestamped, but the last modification to the file was on 2019-04-13 19:31 UTC. @Fnielsen’s Ordia tool is also down, and while I can’t read its uwsgi.log, it has the same modification time.
kubectl get pods listed a “pending” pod 16 hours old (i. e. about as old as the last modification to uwsgi.log, if I’m not mistaken).
Name: quickcategories-654583560-xqip5 Namespace: quickcategories Node: / Labels: name=quickcategories pod-template-hash=654583560 tools.wmflabs.org/webservice=true tools.wmflabs.org/webservice-version=1 Status: Pending IP: Controllers: ReplicaSet/quickcategories-654583560 Containers: webservice: Image: docker-registry.tools.wmflabs.org/toollabs-python-web:latest Port: 8000/TCP Command: /usr/bin/webservice-runner --type uwsgi-python --port 8000 Limits: cpu: 2 memory: 2Gi Requests: cpu: 125m memory: 256Mi Volume Mounts: /data/project/ from home (rw) /data/scratch/ from scratch (rw) /etc/ldap.conf from etcldap-conf-bzn58 (rw) /etc/ldap.yaml from etcldap-yaml-xaarl (rw) /etc/novaobserver.yaml from etcnovaobserver-yaml-syao6 (rw) /etc/wmcs-project from wmcs-project (rw) /mnt/nfs/ from nfs (rw) /public/dumps/ from dumps (rw) /var/run/nslcd/socket from varrunnslcdsocket-dhv68 (rw) Environment Variables: HOME: /data/project/quickcategories/ Conditions: Type Status PodScheduled False Volumes: dumps: Type: HostPath (bare host directory volume) Path: /public/dumps/ home: Type: HostPath (bare host directory volume) Path: /data/project/ wmcs-project: Type: HostPath (bare host directory volume) Path: /etc/wmcs-project nfs: Type: HostPath (bare host directory volume) Path: /mnt/nfs/ scratch: Type: HostPath (bare host directory volume) Path: /data/scratch/ etcldap-conf-bzn58: Type: HostPath (bare host directory volume) Path: /etc/ldap.conf etcldap-yaml-xaarl: Type: HostPath (bare host directory volume) Path: /etc/ldap.yaml etcnovaobserver-yaml-syao6: Type: HostPath (bare host directory volume) Path: /etc/novaobserver.yaml varrunnslcdsocket-dhv68: Type: HostPath (bare host directory volume) Path: /var/run/nslcd/socket QoS Class: Burstable Tolerations: <none> No events.
According to @Chicocvenancio in IRC, there should probably be some illuminating events at the bottom, but I lack the permissions to see them. webservice restart had no effect; deleting the pod brought up another one in the same situation (pending forever).
A custom Kubernetes deployment in the same tool, quickcategories.background-runner, appears to be functional as far as I can tell, though it doesn’t have a whole lot to do if the web frontend that starts background runs isn’t running.