$ kubectl get pods NAME READY STATUS RESTARTS AGE ftl-5646966575-rr2vc 1/1 Running 30 (19m ago) 6h20m $ kubectl get events LAST SEEN TYPE REASON OBJECT MESSAGE 32m Normal Pulling pod/ftl-5646966575-rr2vc Pulling image "docker-registry.tools.wmflabs.org/toolforge-perl532-sssd-web:latest" 22m Warning Unhealthy pod/ftl-5646966575-rr2vc Liveness probe failed: dial tcp 192.168.127.48:8000: i/o timeout
[16:57] <JMarkOckerbloom> My application runs on tools-sgebastion-10, and today (and part of yesterday) it's apparently been hanging a lot. Recent error log record numerous restarts and kill with signal 9. Anyone know what might be happening there (or if there have been any recent changes there)? [17:04] < bd808> JMarkOckerbloom: what sort of process are you trying to run on the bastion that is leading to troubles? [17:05] < bd808> I ask because there are things like Wheel of Misfortune that actively look for rogue processes to kill there [17:12] <JMarkOckerbloom> This is the ftl service (a CGI script implemented in perl). It's been running for a number of years, but today and yesterday it seems to be hanging and getting restarted a fair bit. [17:18] < bd808> JMarkOckerbloom: if it is running on a bastion for a number of years it has been in violation of Toolforge rules for several years. Webservices, bot jobs, etc should always run on Kubernetes (and formerly gird engine). The bastions are for humans to start distributed tasks and do light file editing. [17:21] < bd808> It looks like ftl is actually running its webservice on Kubernetes [17:21] < bd808> but it is restarting a lot... [17:22] < bd808> I wonder if there is something about the new default health checks that is causing problems there? [17:24] < bd808> the output in $HOME/error.log doesn't mean much to me, but it also doesn't look like crash logs [17:26] < bd808> JMarkOckerbloom: this is probably worth a Phabricator task. The `kubectl get pod` output shows 10 restarts in the last 138 minutes so something is up for sure. [17:27] < bd808> I have to run to an IRL meeting, but I can poke around a bit later in my day to see if I can spot anything in particular.