The webservice for the project 'dplbot' on Tool Labs has been failing repeatedly for several days. If you attempt to access any of the project's pages, you get the "No Webservice" page. But if I log in to the project, "qstat" shows the webservice is running, and the error logs don't indicate any problem. "webservice status" reports "Your webservice is running." (Not true.) "webservice restart" works, and the webpages then become available, but this typically only lasts a few minutes and then they go down again.
I moved it to kuberenetes and also fixed the issue with the webservice restarter. Can you verify it works fine under kubernetes? (no changes required from your perspective, since it's still running lighttpd + php)
It is down at the moment. "webservice status" says it is running, but "qstat" shows no server process running.
UPDATE: It did restart itself a few moments after I wrote the above. Still nothing visible under 'qstat'; is that to be expected?
This has been failing over and over yesterday and today. It is intermittent. As a frequent user of dplbot, I have to say that the solution has not been found yet.
UPDATE: An hour later, and it is still down.
It is currently down again. Shell shows the following:
tools.dplbot@tools-bastion-03:~$ kubectl get pod NAME READY STATUS RESTARTS AGE dplbot-1445756605-f0mpw 1/1 Running 0 1d tools.dplbot@tools-bastion-03:~$ webservice status Your webservice is running
I will wait a short time before restarting it manually.
I'm afraid I know nothing about the bot. The nominal maintainer, JaGa, hasn't been active on en.wiki for a month. RussBlau, listed as a maintainer, opened and re-opened this ticket. There's Dispenser, who requested to be a maintainer a few months back, so might be one now.
Should I keep posting notices when it fails, or is that just redundant? (Totally a newbie on the technical side)
ATM, http://tools.wmflabs.org/dplbot/ returns 503 ("No webservice"). There is a pod running:
tools.dplbot@tools-bastion-03:~$ kubectl get pod NAME READY STATUS RESTARTS AGE dplbot-1445756605-gamjq 1/1 Running 0 16d tools.dplbot@tools-bastion-03:~$
On tools-proxy-01, Redis had no key for prefix:dplbot, but tools-proxy-02 had:
scfc@tools-proxy-01:~$ redis-cli 127.0.0.1:6379> HGETALL prefix:dplbot (empty list or set) 127.0.0.1:6379> scfc@tools-proxy-01:~$
127.0.0.1:6379> HGETALL prefix:dplbot 1) ".*" 2) "http://192.168.0.33:8000" 127.0.0.1:6379> scfc@tools-proxy-02:~$
lynx http://192.168.0.33:8000 shows the dplbot page.
I've restarted the webservice yet again, and now the entry on tools-proxy-01 points to http://192.168.0.50:8000, with the entry on tools-proxy-02 still unchanged. So the Redis replication from tools-proxy-01 to tools-proxy-02 is broken, and I have filed T152356 for that.
However, I'm closing this task for the time being so that we have a base line when the webservice worked with which parameters. If it fails again, please reopen this task so that it can be investigated.