Page MenuHomePhabricator

Jenkins: Builds (for beta cluster and browser tests) are stuck forever if IRC notification failed
Closed, ResolvedPublic


Related upstream:


  • Connection timeout for IRC
  • Regular build timeout setting does not apply to post-build actions like artefacts and notifications.
  • Cancelling the build doesn't work. It remains stuck.
  • Disconnecting the slave and taking the slave offline has no effect.
  • There seems to be no way to recover besides killing the whole Jenkins server and restarting.

The wmf-insecte Jenkins IRC bot no more show in channels.

We have logs at

Jenkins full thread dump is at P584 and from there:

Two jobs are blocked:

"Executor #2 for integration-slave-trusty-1016 : executing #234" prio=5 BLOCKED
"Executor #1 for integration-slave-trusty-1012 : executing #494" prio=5 BLOCKED


.A configuration submit change is blocked as well:

"Handling POST /ci/configSubmit from X.X.X.X : RequestHandlerThread[#1683]" daemon prio=5 WAITING
	sun.misc.Unsafe.park(Native Method)

Some other related threads:

"JenkinsIsBusyListener-thread" daemon prio=5 BLOCKED$000($
"IM-Reconnector-Thread" daemon prio=5 BLOCKED$

Event Timeline

Krinkle created this task.Apr 15 2015, 7:37 PM
Krinkle raised the priority of this task from to Needs Triage.
Krinkle updated the task description. (Show Details)
Krinkle moved this task to Backlog on the Jenkins board.
Krinkle added a subscriber: Krinkle.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptApr 15 2015, 7:37 PM
hashar added a subscriber: hashar.Apr 15 2015, 7:44 PM
tcp6     160      0     CLOSE_WAIT  1519/java

That is the Jenkins IRC connection to on port 7000 and there is no other. So the IRC plugin is no more connected and indeed wmf-insecte is not showing in #wikimedia-releng.

I finished the CLOSE_WAIT connection by injecting an ACK packet pretending to be from freenode server:

packit -i eth0  -m inject -s -S 7000 -d -D 39429 -F A

That got rid of the left over connection, but did not trigger anything java side. So Jenkins is still stuck. I cant see any other way beside kill -9.

I have stopped Jenkins then kill -9 it and started it back.

The jobs and config saves were blocked attempting to notify to IRC because of a lock in the plugin. The lock was apparently held by a "jenkinsisbusylistener" thread which had in its stacktrace something like an irc.connection.shutdown method.

So I guess the plugin does not timeout when disconnecting from the remote server, and if something goes wild in the terminaison, it ends up being stuck with a lock held :(

After restarting the JenkinsIsBusyListener-thread looks like:

sun.misc.Unsafe.park(Native Method)
Krinkle triaged this task as Medium priority.Apr 17 2015, 12:18 PM
Krinkle set Security to None.
hashar updated the task description. (Show Details)EditedApr 30 2015, 12:08 PM

Jenkins deadlocked again. I took a threaddump available at P584. I have updated this task details with stack traces of blocked threads.

In jenkins.log

Apr 30, 2015 11:21:37 AM$ConnectorRunnable run
INFO: Trying to reconnect

When I restarted Jenkins we get:

Apr 30, 2015 12:11:18 PM$ConnectorRunnable run
INFO: Trying to reconnect
Apr 30, 2015 12:11:19 PM hudson.plugins.ircbot.v2.IRCConnection connect
INFO: Connecting to as wmf-insecte using charset UTF-8
Apr 30, 2015 12:11:29 PM hudson.plugins.ircbot.v2.IRCConnection connect
INFO: connected to IRC

So the plugin is stuck trying to acquire a new connection because the hold one is not properly terminated :(

hashar added a comment.EditedApr 30 2015, 1:43 PM

I have downgraded the IRC plugin from 2.26 to 2.25. The upgrade might have caused the issue. Changelog from

Version 2.26 (2015-02-19)

  • don't make concurrent builds wait for the previous build (with instant-messaging-plugin 1.33) issue #26892
  • make delay between messages configurable via system property "hudson.plugins.ircbot.messageRate"
  • try to connect to NickServ protected up to 2 minutes in case NickServ is reacting very slowly

Version 2.25 (Apr 2, 2014)

git log

A commit made to IRCConnection.close() changed:

- this.pircConnection.disconnect();
+ this.pircConnection.shutdown(true);

The method shows up in "Handling POST /ci/configSubmit" thread. Been done to fix a leak JENKINS-25349.

hashar updated the task description. (Show Details)Jun 9 2015, 8:12 PM
hashar moved this task from Backlog to Reported Upstream on the Upstream board.
hashar moved this task from Backlog to Reported upstream on the Jenkins board.
Krinkle removed a subscriber: Krinkle.May 19 2016, 5:19 PM
hashar added a comment.Jun 7 2016, 9:01 AM

From discussion on I am going to upgrade our ircbot plugin from 2.25 to 2.27. Upstream has bumped Pircbotx to 2.0.1 which might fix it.

Mentioned in SAL [2016-06-07T09:02:56Z] <hashar> Upgrading Jenkins IRC plugin 2.25..2.27 and instant messaging plugin 1.34..1.35 . The former should fix a deadlock on shutdowning Jenkins | T96183

hashar moved this task from Reported Upstream to Patch merged upstream on the Upstream board.
hashar closed this task as Resolved.May 2 2017, 8:32 PM
hashar claimed this task.

Might have been solved via IRC plugin 2.27.