Following the rolling restart of the Cloud-VPS infrastructure on 2018-06-06, a large number of Kubernetes powered webservices in Toolforge remained in an unavailable state. Some spot checking revealed that a large number of pods (the unit of work for running a webservice Docker container on Kubernetes) were in the CrashLoopBackOff. This means that the pod had started, died, and been restarted several times. See initial list at:
Spot checking found that a large number of these looping pods were failing due to a missing mount of the /etc/wmcs-project into the Docker container. The webservice-runner command checks this file to determine which project it is running in (tools vs tools-beta). When not found the webservice-runner script dies which in turn kills the Docker container.