Page MenuHomePhabricator

Web services continually restarting
Closed, InvalidPublic

Description

Since August 17, web services have continually gone up and down. Big Brother does restart web services. The few times I was logged into labs when things went down, I did a 'qstat -f'. Load averages for web-grid nodes were at or above 20.


Version: unspecified
Severity: normal

Details

Reference
bz69934

Event Timeline

bzimport raised the priority of this task from to High.Nov 22 2014, 3:37 AM
bzimport added a project: Toolforge.
bzimport set Reference to bz69934.

metatron wrote:

I can confirm this. Multiple restarts for no clear reason. In both emaples a running service had been terminated + restarted automatically.

Bigbrother mails:
2014-08-21 05:31:16 info: Restarting job 'lighttpd-xtools'
2014-08-21 05:33:05 warn: job 'lighttpd-xtools' failed to start
2014-08-21 05:33:05 info: Restarting job 'lighttpd-xtools'

2014-08-21 20:20:31 info: Restarting job 'lighttpd-xtools'
2014-08-21 20:22:30 warn: job 'lighttpd-xtools' failed to start
2014-08-21 20:22:31 info: Restarting job 'lighttpd-xtools'

qacct reports:
jobname lighttpd-xtools
jobnumber 3328857
taskid undefined
account sge
priority 0
qsub_time Thu Aug 21 05:33:06 2014
start_time Thu Aug 21 05:33:18 2014
end_time Thu Aug 21 20:20:27 2014
granted_pe NONE
slots 1
failed 0
exit_status 0

jobname lighttpd-xtools
jobnumber 3345286
taskid undefined
account sge
priority 0
qsub_time Thu Aug 21 20:22:32 2014
start_time Thu Aug 21 20:22:33 2014
end_time Fri Aug 22 11:37:33 2014
granted_pe NONE
slots 1
failed 0
exit_status 0

A quick perusal of the logs show that this happens only to a short (~12) list of webservices, in bursts.

My current working hypothesis is that this is due to leaking fcgi combined with memory pressure (that is, the problem is always present but leads to webservices being restarted only when resource use is especially high).

Could the maintainers of the affected tools please look into their logs to see if those restart match periods of unusual activity?

yuvipanda subscribed.

No update in about 4 months, so closing as invalid for now?