Every now and then, CropTool locks up every now and then and have to be manually restarted. If I'm late restarting it, there's reports on [[ https://commons.wikimedia.org/wiki/Commons_talk:CropTool | Commons ]] or [[ https://github.com/danmichaelo/croptool/issues?utf8=%E2%9C%93&q=label%3Aserver-lock-issue | GitHub]] (I haven't yet found a way to automatically restart the server since all cronjobs are submitted to the grid and I wasn't able to restart the webservice from the grid).
For some time I believed it was all due to T104799, but then started doubting. Then there was T182070#4305541 which brough the issue to my attention again.
Today, I asked #wikimedia-cloud for help looking at the processes before restarting the webservice. Here's the findings of @zhuyifei1999 and @bd808:
Lots of open connections:
```
# lsof -p 2592 | grep TCP | wc -l
187
```
No CPU usage:
```
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
tools.c+ 2592 0.1 0.0 54208 5444 ? Ss Jun28 1:39 /usr/sbin/lighttpd -f /var/run/lighttpd/croptool -D
tools.c+ 2598 0.0 0.2 340556 20764 ? Ss Jun28 0:00 \_ /usr/bin/php-cgi
tools.c+ 2600 0.2 0.6 647472 51796 ? Sl Jun28 2:04 | \_ /usr/bin/php-cgi
tools.c+ 2601 0.2 0.7 656452 60020 ? Sl Jun28 2:19 | \_ /usr/bin/php-cgi
tools.c+ 2599 0.0 0.2 340556 20760 ? Ss Jun28 0:00 \_ /usr/bin/php-cgi
tools.c+ 2602 0.4 0.7 655884 62368 ? Sl Jun28 4:32 \_ /usr/bin/php-cgi
tools.c+ 2603 0.5 0.8 662800 67344 ? Sl Jun28 5:35 \_ /usr/bin/php-cgi
```
Strack trace indicating PHP is blocked by malloc: https://www.irccloud.com/pastebin/Kf3rlR6T/
[17:20:02] <+bd808> the last stack trace I see as a paste from you looks like -- php ran out of memory while trying to create a backtrace for an exception and then tried to start handling that OOM error when it hit the deadlock.
[17:20:41] <+bd808> my guess is that xdebug's tracing is holding a non-reentrant lock
[17:26:27] <zhuyifei1999_> bd808: makes sense. libc itself is holding the lock
[17:28:06] <zhuyifei1999_> so malloc ran out of memory, grid sends php a sigint, php's signal handler gets called and tries to malloc again, non-reentrant
[17:28:44] <zhuyifei1999_> so it just deadlocks on itself
[17:28:54] <+bd808> zhuyifei1999_: yeah, I think we could search the web a bit and find that this is a known problem in php 5.x error handling