>>! In T182070#4002783, @chasemp wrote:
>>>! In T182070#4002514, @hashar wrote:
>> It seems the faulty webgrid jobs have pilled up. If one could kill the stuck `/usr/bin/php-cgi` processes by `tools.jembot`, that would be nice :]
>
> confirmed, I see tons of leakage via `clush -w @all 'sudo pidstat -U tools.jembot' | grep jem`
>
> culled things with
>
>> clush -w @all 'sudo /usr/bin/pkill --signal 9 -u tools.jembot'
A recent snapshot of activity shows 1146 restarts of the webservice in the last 7 calendar days
{F14439402}
The error.log data seems to indicate that the main lighttpd process is being killed on a regular basis for exceeding its memory limit:
```
2018-03-05 21:30:23: (log.c.166) server started
2018-03-05 21:40:21: (server.c.1558) server stopped by UID = 0 PID = 25145
2018-03-05 21:40:24: (log.c.166) server started
2018-03-05 21:50:20: (server.c.1558) server stopped by UID = 0 PID = 30322
2018-03-05 21:50:23: (log.c.166) server started
2018-03-05 22:00:21: (server.c.1558) server stopped by UID = 0 PID = 7331
2018-03-05 22:00:23: (log.c.166) server started
2018-03-05 22:10:20: (server.c.1558) server stopped by UID = 0 PID = 19091
2018-03-05 22:10:22: (log.c.166) server started
2018-03-05 22:20:20: (server.c.1558) server stopped by UID = 0 PID = 7240
2018-03-05 22:20:22: (log.c.166) server started
2018-03-05 22:30:21: (server.c.1558) server stopped by UID = 0 PID = 10593
2018-03-05 22:30:23: (log.c.166) server started
```
See also:
* {T132879}
* {T182070}
* {T179378}
* {T109362}