Page MenuHomePhabricator

integration-slave-docker-1001 deadlocked since Friday Oct 6th ~ 19:00 utc
Closed, ResolvedPublic

Description

integration-slave-docker-1001 no more respond since October 6th roughly 19:00 UTC.

docker1001_deadlock.png (438×575 px, 32 KB)

From the console:

Debian GNU/Linux 8 integration-slave-docker-1001 ttyS0

integration-slave-docker-1001 login:
[639815.194483] INFO: task systemd:1 blocked for more than 120 seconds.
[639815.200819]       Not tainted 4.9.0-0.bpo.3-amd64 #1 Debian 4.9.30-2+deb9u2~bpo8+1
[639815.204213] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[639815.205755] INFO: task kswapd0:36 blocked for more than 120 seconds.
[639815.206510]       Not tainted 4.9.0-0.bpo.3-amd64 #1 Debian 4.9.30-2+deb9u2~bpo8+1
[639815.207419] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[639815.208562] INFO: task loop0:765 blocked for more than 120 seconds.
[639815.209272]       Not tainted 4.9.0-0.bpo.3-amd64 #1 Debian 4.9.30-2+deb9u2~bpo8+1
[639815.210694] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[639815.212042] INFO: task kworker/u4:0:17569 blocked for more than 120 seconds.
[639815.212828]       Not tainted 4.9.0-0.bpo.3-amd64 #1 Debian 4.9.30-2+deb9u2~bpo8+1
[639815.215573] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[639815.216390] INFO: task kworker/u4:3:19433 blocked for more than 120 seconds.
[639815.217028]       Not tainted 4.9.0-0.bpo.3-amd64 #1 Debian 4.9.30-2+deb9u2~bpo8+1
[639815.218002] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[639815.219111] INFO: task xfsaild/dm-2:21620 blocked for more than 120 seconds.
[639815.219853]       Not tainted 4.9.0-0.bpo.3-amd64 #1 Debian 4.9.30-2+deb9u2~bpo8+1
[639815.220594] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[639815.221804] INFO: task kworker/0:2:21621 blocked for more than 120 seconds.
[639815.222454]       Not tainted 4.9.0-0.bpo.3-amd64 #1 Debian 4.9.30-2+deb9u2~bpo8+1
[639815.223256] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[639815.224283] INFO: task ruby2.1:22448 blocked for more than 120 seconds.
[639815.224958]       Not tainted 4.9.0-0.bpo.3-amd64 #1 Debian 4.9.30-2+deb9u2~bpo8+1
[639815.227713] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[639936.027597] INFO: task systemd:1 blocked for more than 120 seconds.
[639936.034998]       Not tainted 4.9.0-0.bpo.3-amd64 #1 Debian 4.9.30-2+deb9u2~bpo8+1
[639936.036428] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[639936.038213] INFO: task kswapd0:36 blocked for more than 120 seconds.
[639936.038978]       Not tainted 4.9.0-0.bpo.3-amd64 #1 Debian 4.9.30-2+deb9u2~bpo8+1
[639936.040396] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.

Event Timeline

Mentioned in SAL (#wikimedia-releng) [2017-10-09T08:53:23Z] <hashar> hard restart integration-slave-docker-1001 via horizon. It is deadlocked somehow. - T177749

Mentioned in SAL (#wikimedia-operations) [2017-10-09T09:43:46Z] <hashar> restarting Jenkins. Deadlock in SSHSlave plugin that causes memory to leak quite rapidly - T177749

hashar claimed this task.

Hard rebooted, force ran puppet. The Jenkins channel to communicate with the agent is deadlocked in SSHSlave.afterDisconnect(). I have restarted Jenkins.