Page MenuHomePhabricator

Jenkins Gearman plugin has deadlock on executor threads (was: Beta Cluster stopped receiving code updates (beta-update-databases-eqiad hung)
Open, NormalPublic

Description

Workaround

To remove the deadlock it is recommended to disconnect Jenkins from the Gearman server and reconnect it. This is done on the https://integration.wikimedia.org/ci/manage page:

Uncheck the box, browse to the bottom and save. That removes the deadlock instantly. After a few seconds, check the box again and save.

If it still fail. Restart Jenkins entirely :(

Upstream bug is https://issues.jenkins-ci.org/browse/JENKINS-25867


From James' email to the QA list:

Beta Labs isn't synchronising; AFAICS it hasn't done so since ~ 11 hours
ago (15:10 UTC on 2014-09-08). I noticed this when prepping a patch for
tomorrow and found that.

Going to https://integration.wikimedia.org/ci/view/Beta/ I found that
"beta-update-databases-eqiad" had been executing for 12 hours, and
initially assumed that we had a run-away update.php issue again. However,
on examining it looks like "deployment-bastion.eqiad", or the jenkins
executor on it, isn't responding in some way:

pending—Waiting for next available executor on deployment-bastion.eqiad

I terminated the beta-update-databases-eqiad run to see if that would help,
but it just switched over to beta-scap-eqiad being the pending task.

Having chatted with MaxSem, I briefly disabled in the jenkins interface the
deployment-bastion.eqiad node and then re-enabled it, to no effect.

Any ideas?

November 2014 thread dump:

/ https://fastthread.io/my-thread-report.jsp?p=c2hhcmVkLzIwMTkvMDUvMy8tLWplbmtpbnMtdGhyZWFkcy1kdW1wLnR4dC0tMTAtMjctMw==

July 2019 one:

Another threaddump P8736
https://fastthread.io/my-thread-report.jsp?p=c2hhcmVkLzIwMTkvMDcvMTEvLS10aHJlYWRkdW1wLnR4dC0tOS0zMC0zMg==

Details

Reference
bz70597

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Most executors are held in lock as mentioned on comment #14. I noticed one thread having a different stacktrace:

"Gearman worker deployment-bastion.eqiad_exec-1" daemon prio=5 WAITING

java.lang.Object.wait(Native Method)
java.lang.Object.wait(Object.java:502)
hudson.remoting.AsyncFutureImpl.get(AsyncFutureImpl.java:73)
hudson.plugins.gearman
    .StartJobWorker.safeExecuteFunction(StartJobWorker.java:196)
hudson.plugins.gearman
    .StartJobWorker.executeFunction(StartJobWorker.java:114)
org.gearman.worker
    .AbstractGearmanFunction.call(AbstractGearmanFunction.java:125)
org.gearman.worker
    .AbstractGearmanFunction.call(AbstractGearmanFunction.java:22)
hudson.plugins.gearman.MyGearmanWorkerImpl
    .submitFunction(MyGearmanWorkerImpl.java:590)
hudson.plugins.gearman.MyGearmanWorkerImpl
    .work(MyGearmanWorkerImpl.java:374)
hudson.plugins.gearman.AbstractWorkerThread
    .run(AbstractWorkerThread.java:166)
java.lang.Thread.run(Thread.java:701)
  • Bug 73659 has been marked as a duplicate of this bug. ***
hashar set Security to None.

So this is still happening and I really need to fill it upstream.

The executors are locked by:

Gearman worker integration-slave1006_exec-3
java.lang.Object.wait(Native Method)
hudson.plugins.gearman.NodeAvailabilityMonitor.lock(NodeAvailabilityMonitor.java:83)
hudson.plugins.gearman.MyGearmanWorkerImpl.sendGrabJob(MyGearmanWorkerImpl.java:388)
hudson.plugins.gearman.MyGearmanWorkerImpl.work(MyGearmanWorkerImpl.java:310)
hudson.plugins.gearman.AbstractWorkerThread.run(AbstractWorkerThread.java:166)
java.lang.Thread.run(Thread.java:745)

The Jenkins logger for hudson.plugins.gearman.logger show a spam of:

Nov 26, 2014 10:24:21 PM FINE hudson.plugins.gearman.NodeAvailabilityMonitor canTake
AvailabilityMonitor canTake request for >SOME VALUE<

Where >SOME VALUE< is null or one one of the executor thread.

Filled upstream at https://storyboard.openstack.org/#!/story/2000030 . Will have to poke Khai Do (upstream author) about it.

hashar renamed this task from Beta Cluster stopped receiving code updates (beta-update-databases-eqiad hung) to Jenkins Gearman plugin has deadlock on executor threads (was: Beta Cluster stopped receiving code updates (beta-update-databases-eqiad hung).Nov 26 2014, 10:48 PM
hashar claimed this task.
hashar raised the priority of this task from Normal to High.
hashar renamed this task from Jenkins Gearman plugin has deadlock on executor threads (was: Beta Cluster stopped receiving code updates (beta-update-databases-eqiad hung) to [upstream] Jenkins Gearman plugin has deadlock on executor threads (was: Beta Cluster stopped receiving code updates (beta-update-databases-eqiad hung).Dec 1 2014, 11:57 AM
hashar added a project: Upstream.

You can assume that this upstream bug will not be fixed within our life time, and we should not invest in learning all Jenkins internals to fix it ourselves (it's a significant problem in how Jenkins works).

For the end result of fixing the Beta cluster updates from going stuck indefinitely on a daily basis, you can consider this blocked on T96199: Rewrite beta-update-databases to not use unstable Configuration Matrix.

I have upgraded the gearman plugin earlier from 1.1.1 to f2024bd. From git log --reverse --no-merges 0.1.1..master the three interesting changes are:

commit 6de3cdd29bb8c4336a468985af6f8e0e4fd88e66
Author: James E. Blair <jeblair@hp.com>
Date:   Thu Jan 8 08:37:50 2015 -0800

    Protect against partially initialized executer workers
    
    The registerJobs method of an executorworker can be invoked by an
    external caller before the worker is completely initialized by
    its run method.  We protected against that by checking one instance
    variable, but there's still a race condition involving another.
    Add a check for that variable as well.
    
    Change-Id: I8e2cfffb54aa8a4cf8b1e61e9a9184b091054462

That one might solve the issue we are encountering with executor not being available anymore.

commit 7abfdbd2d00010a1121cefebf479bcf104e7ef18
Author: James E. Blair <jeblair@hp.com>
Date:   Tue May 5 10:38:25 2015 -0700

    Stop sending status updates
    
    Don't send status updates every 10 seconds.  Only send them at the
    start of a job (to fill in information like worker and expected
    duration, etc).  We don't actually do anything with subsequent
    updates, and if Zuul wants to know how long a job has been running
    it's perfectly capable of working that out on its own.
    
    Change-Id: I4df5f82b3375239df35e3bc4b03e1263026f0a68
commit 65a08e0e959b0853538eabeec030d594a01c4385
Author: Clark Boylan <clark.boylan@gmail.com>
Date:   Mon May 4 18:09:31 2015 -0700

    Fix race between adding job and registering
    
    Gearman plugin had a race between adding jobs to the functionList and
    registering jobs. When registering jobs the functionMap is cleared, when
    adding a job the plugin checks if the job is in the function Map before
    running it. If we happen to trigger registration of jobs when we get a
    response from gearman with a job assignment then the functionMap can be
    empty making us send a work fail instead of running the job.
    
    To make things worse this jenkins worker would not send a subsequent
    GET JOB and would live lock never doing any useful work.
    
    Correct this by making the processing for gearman events synchronous in
    the work loop. This ensures that we never try to clear the function map
    and check against it at the same time via different threads. To make
    this happen the handleSessionEvent() method puts all events on a thread
    safe queue for synchronous processing. This has allowed us to simplify
    the work() loop and basically do the following:
    
      while running:
        init()
        register()
        process one event
        run function if processed
        drive IO
    
    This is much easier to reason about as we essentially only have
    bookkeeping and the code for one thing at a time.
    
    Change-Id: Id537710f6c8276a528ad78afd72c5a7c8e8a16ac

Might be blocking jobs as well :/

@Antoine, I gave up on storyboard. We are working on it over in the jira issue tracker.. https://issues.jenkins-ci.org/browse/JENKINS-25867

one additional change you should pick up is https://review.openstack.org/#/c/192429/

@hashar, I gave up on storyboard. We are working on it over in the jira issue tracker.. https://issues.jenkins-ci.org/browse/JENKINS-25867
one additional change you should pick up is https://review.openstack.org/#/c/192429/

Ah thanks for switching to JIRA, this way we get mail notifications :-D I followed up there.

Rebuilding the plugin with https://review.openstack.org/#/c/192429/2

git fetch https://review.openstack.org/openstack-infra/gearman-plugin refs/changes/29/192429/2 && git checkout FETCH_HEAD
mvn -Dproject-version="`git describe`-change_192429_2" -DskipTests=true  clean package

Thus upgrading the plugin from 0.1.1-8-gf2024bd to 0.1.1-9-g08e9c42-change_192429_2.

hashar closed this task as Resolved.Jul 22 2015, 1:05 PM

I haven't noticed the error since July 1st nor does Jenkins logs show any null lock. Thus seems the fix in gearman plugin fixed it.

hashar reopened this task as Open.Jul 28 2015, 10:34 AM

It happened again :(

Jul 28, 2015 10:30:35 AM FINE hudson.plugins.gearman.NodeAvailabilityMonitor canTake
AvailabilityMonitor canTake request for null
Jul 28, 2015 10:30:35 AM FINE hudson.plugins.gearman.NodeAvailabilityMonitor canTake
AvailabilityMonitor canTake request for null
Jul 28, 2015 10:30:35 AM FINE hudson.plugins.gearman.NodeAvailabilityMonitor canTake
AvailabilityMonitor canTake request for null

With beta-scap-eqiad and beta-update-database-eqiad being stuck waiting for an available executor on deployment-bastion.

Marking the node offline and online doesn't remove the lock :-/

The executor threads have:

"Gearman worker deployment-bastion.eqiad_exec-1" prio=5 WAITING
	java.lang.Object.wait(Native Method)
	java.lang.Object.wait(Object.java:503)
	hudson.remoting.AsyncFutureImpl.get(AsyncFutureImpl.java:73)
	hudson.plugins.gearman.StartJobWorker.safeExecuteFunction(StartJobWorker.java:196)
	hudson.plugins.gearman.StartJobWorker.executeFunction(StartJobWorker.java:114)
	org.gearman.worker.AbstractGearmanFunction.call(AbstractGearmanFunction.java:125)
	org.gearman.worker.AbstractGearmanFunction.call(AbstractGearmanFunction.java:22)
	hudson.plugins.gearman.MyGearmanWorkerImpl.submitFunction(MyGearmanWorkerImpl.java:593)
	hudson.plugins.gearman.MyGearmanWorkerImpl.work(MyGearmanWorkerImpl.java:328)
	hudson.plugins.gearman.AbstractWorkerThread.run(AbstractWorkerThread.java:166)
	java.lang.Thread.run(Thread.java:745)

"Gearman worker deployment-bastion.eqiad_exec-2" prio=5 TIMED_WAITING
	java.lang.Object.wait(Native Method)
	hudson.plugins.gearman.NodeAvailabilityMonitor.lock(NodeAvailabilityMonitor.java:83)
	hudson.plugins.gearman.MyGearmanWorkerImpl.sendGrabJob(MyGearmanWorkerImpl.java:380)
	hudson.plugins.gearman.MyGearmanWorkerImpl.processSessionEvent(MyGearmanWorkerImpl.java:421)
	hudson.plugins.gearman.MyGearmanWorkerImpl.work(MyGearmanWorkerImpl.java:320)
	hudson.plugins.gearman.AbstractWorkerThread.run(AbstractWorkerThread.java:166)
	java.lang.Thread.run(Thread.java:745)

"Gearman worker deployment-bastion.eqiad_exec-3" prio=5 TIMED_WAITING
	java.lang.Object.wait(Native Method)
	hudson.plugins.gearman.NodeAvailabilityMonitor.lock(NodeAvailabilityMonitor.java:83)
	hudson.plugins.gearman.MyGearmanWorkerImpl.sendGrabJob(MyGearmanWorkerImpl.java:380)
	hudson.plugins.gearman.MyGearmanWorkerImpl.processSessionEvent(MyGearmanWorkerImpl.java:421)
	hudson.plugins.gearman.MyGearmanWorkerImpl.work(MyGearmanWorkerImpl.java:320)
	hudson.plugins.gearman.AbstractWorkerThread.run(AbstractWorkerThread.java:166)
	java.lang.Thread.run(Thread.java:745)

"Gearman worker deployment-bastion.eqiad_exec-4" prio=5 TIMED_WAITING
	java.lang.Object.wait(Native Method)
	hudson.plugins.gearman.NodeAvailabilityMonitor.lock(NodeAvailabilityMonitor.java:83)
	hudson.plugins.gearman.MyGearmanWorkerImpl.sendGrabJob(MyGearmanWorkerImpl.java:380)
	hudson.plugins.gearman.MyGearmanWorkerImpl.processSessionEvent(MyGearmanWorkerImpl.java:421)
	hudson.plugins.gearman.MyGearmanWorkerImpl.work(MyGearmanWorkerImpl.java:320)
	hudson.plugins.gearman.AbstractWorkerThread.run(AbstractWorkerThread.java:166)
	java.lang.Thread.run(Thread.java:745)

"Gearman worker deployment-bastion.eqiad_exec-5" prio=5 TIMED_WAITING
	java.lang.Object.wait(Native Method)
	hudson.plugins.gearman.NodeAvailabilityMonitor.lock(NodeAvailabilityMonitor.java:83)
	hudson.plugins.gearman.MyGearmanWorkerImpl.sendGrabJob(MyGearmanWorkerImpl.java:380)
	hudson.plugins.gearman.MyGearmanWorkerImpl.processSessionEvent(MyGearmanWorkerImpl.java:421)
	hudson.plugins.gearman.MyGearmanWorkerImpl.work(MyGearmanWorkerImpl.java:320)
	hudson.plugins.gearman.AbstractWorkerThread.run(AbstractWorkerThread.java:166)
	java.lang.Thread.run(Thread.java:745)

The node is named deployment-bastion-eqiad, with a label deployment-bastion-eqiad. Jobs are tied to deployment-bastion-eqiad.

The workaround I found was to remove the label from the node. Once done, the jobs shows in the queue with 'no node having label deployment-bastion-eqiad'. I then applied the label again on the host and the jobs managed to run.

So maybe it is an issue in Jenkins itself :-}

Change 227440 had a related patch set uploaded (by Hashar):
beta: expand {datacenter} to 'eqiad'

https://gerrit.wikimedia.org/r/227440

Change 227441 had a related patch set uploaded (by Hashar):
beta: disambiguate Jenkins label from node name

https://gerrit.wikimedia.org/r/227441

Change 227440 merged by jenkins-bot:
beta: expand {datacenter} to 'eqiad'

https://gerrit.wikimedia.org/r/227440

Change 227441 merged by jenkins-bot:
beta: disambiguate Jenkins label from node name

https://gerrit.wikimedia.org/r/227441

I renamed the Jenkins label to disambiguate the node name and the label (now BetaClusterBastion).

hashar added a comment.EditedDec 16 2015, 11:41 AM

That still happens from time to time with Jenkins 1.625.3 and the Gearman Plugin 0.1.3.3.01da2d4 (which is 1.3.3 + https://review.openstack.org/#/c/252768/ ).

To remove the deadlock one can either:

hashar removed hashar as the assignee of this task.Dec 16 2015, 11:43 AM
hashar updated the task description. (Show Details)
hashar updated the task description. (Show Details)
hashar moved this task from Backlog to Reported Upstream on the Upstream board.

Upstream https://review.openstack.org/#/c/252768/ has been abandoned in favor of https://review.openstack.org/#/c/271543/ . It uses a different internal API which should no more return null. Specially replaces:

- Computer.currentComputer()
+ Jenkins.getActiveInstance().getComputer("")

But that is solely to properly get the master node on which we run no job. So unlikely to fix anything for us.

Updated the Gearman plugin to 0.1.3.3.a5164d6

bd808 added a comment.Feb 4 2016, 12:54 AM

Locked up again. I don't quite understand the instructions in the summary that say you can disable the gearman plugin without restarting Jenkins. Doesn't enable/disable of a plugin require a restart?

hashar updated the task description. (Show Details)Feb 4 2016, 12:30 PM

Sorry @bd808 the phrasing wasn't perfect. There is no need to disable the Gearman plugin, just have to disconnect it from the Gearman server which is done in the Jenkins manage page:

Doing so causes the Gearman client on Jenkins to disconnect from Zuul Gearman server and stop overriding the Jenkins executors/reset state which get rid of the deadlock.

I have updated the task summary with above screenshot (that shows my leet Paint skills).

bd808 added a comment.Feb 4 2016, 5:22 PM

I have updated the task summary with above screenshot (that shows my leet Paint skills).

Thanks @hashar. I'd forgotten about that setting and was instead thinking that the instruction was to disable/enable the plugin at https://integration.wikimedia.org/ci/pluginManager/.

Krinkle removed a subscriber: Krinkle.Feb 23 2016, 4:49 PM
Danny_B renamed this task from [upstream] Jenkins Gearman plugin has deadlock on executor threads (was: Beta Cluster stopped receiving code updates (beta-update-databases-eqiad hung) to Jenkins Gearman plugin has deadlock on executor threads (was: Beta Cluster stopped receiving code updates (beta-update-databases-eqiad hung).May 31 2016, 3:14 PM
Danny_B removed a subscriber: wikibugs-l-list.
Danny_B removed a subscriber: Language-Team.

Blind search-and-replace was incorrect here. This is not a Language-Team bug, we were only CCed.

hashar closed this task as Resolved.Jul 27 2016, 9:12 AM
hashar claimed this task.
hashar lowered the priority of this task from High to Normal.

This still happen albeit very rarely nowadays to a point it is almost a non issue. I have only noticed it once over the last few months and the root cause was unrelated (thread starvation in the Jenkins SSH plugin that caused it to no more properly connect slaves).

Assuming it is fixed.

greg added a comment.Nov 22 2017, 6:46 PM

This still happen albeit very rarely nowadays to a point it is almost a non issue. I have only noticed it once over the last few months and the root cause was unrelated (thread starvation in the Jenkins SSH plugin that caused it to no more properly connect slaves).

So, this is still happening but I'm unsure of the root cause. It happened again today and happened a few weeks ago: http://tools.wmflabs.org/sal/log/AV9ZS8j8F4fsM4DBdUuo (2017-10-26). Once a month is annoying.

Assuming it is fixed.

Well, something isn't :P

hashar reopened this task as Open.Nov 24 2017, 10:10 PM

That still happens indeed (T181313). Maybe we can move the beta cluster jobs to a dedicated/standalone Jenkins instance.

hashar removed hashar as the assignee of this task.Jan 31 2018, 8:14 PM

Potentially we would create a dedicated Jenkins to drive beta cluster which is T183164. Unassigning since I am focusing on other duties.

Change 419674 had a related patch set uploaded (by Hashar; owner: Hashar):
[integration/config@master] Use a Zuul mutex for the coverage patch jobs

https://gerrit.wikimedia.org/r/419674

Change 419674 merged by jenkins-bot:
[integration/config@master] Use a Zuul mutex for the coverage patch jobs

https://gerrit.wikimedia.org/r/419674

Reedy added a comment.Apr 20 2018, 3:14 PM

This happens once in a while. It's some sort of deadlock in Jenkins itself. Here's how I generally try to resolve it:

Sometimes you have to do this whole dance twice before Jenkins realizes that the there are a bunch of executors that it can use.
This deadlock seems to happen more often than not following or during a database update that is taking a while to complete.

^ just had to do this for deployment-tin

Mentioned in SAL (#wikimedia-releng) [2018-07-04T20:15:19Z] <Reedy> beta unbroke beta code autodeploy T72597

hashar updated the task description. (Show Details)May 3 2019, 10:28 AM

Mentioned in SAL (#wikimedia-releng) [2019-07-11T09:06:25Z] <hashar> beta cluster jobs are dead locked. Taking a thread dump in case it helps figure out what is going on. T72597

In the Jenkins log bucket hudson.model.Queue.

First the builds in the queue:

FINE hudson.model.Queue

Queue maintenance started on hudson.model.Queue@2ea587fb with Queue.Snapshot{
waitingList=[
  hudson.model.Queue$WaitingItem:hudson.model.FreeStyleProject@1956d4b9[mwgate-node10-docker]:116546,
  hudson.model.Queue$WaitingItem:hudson.model.FreeStyleProject@35e87319[mwext-php72-phan-docker]:116547,
  hudson.model.Queue$WaitingItem:hudson.model.FreeStyleProject@fc512c3[mwext-php70-phan-seccheck-docker]:116548
];
blockedProjects=[
  hudson.model.Queue$BlockedItem:hudson.model.FreeStyleProject@34bd6b8b[beta-mediawiki-config-update-eqiad]:113951,
  hudson.model.Queue$BlockedItem:hudson.model.FreeStyleProject@4988adf9[beta-code-update-eqiad]:113983];
  buildables=[
    hudson.model.Queue$BuildableItem:hudson.model.FreeStyleProject@38d6ee0d[beta-scap-eqiad]:113952,
    hudson.model.Queue$BuildableItem:hudson.model.FreeStyleProject@c175473[beta-update-databases-eqiad]:114190,
    hudson.model.Queue$BuildableItem:hudson.model.FreeStyleProject@7728417c[beta-publish-deb]:116220];
    pendings=[]
}
FINER hudson.model.Queue

Failed to map hudson.model.Queue$BuildableItem:hudson.model.FreeStyleProject@38d6ee0d[beta-scap-eqiad]:113952 to executors.
candidates=[]
parked=[
JobOffer[integration-r-lang-01 #0], JobOffer[integration-slave-docker-1043 #2], JobOffer[integration-slave-docker-1050 #3], JobOffer[saucelabs-01 #0], JobOffer[saucelabs-02 #0], JobOffer[deployment-deploy01 #3], JobOffer[integration-slave-jessie-1001 #0], JobOffer[integration-castor03 #0], JobOffer[compiler1002.puppet-diffs.eqiad.wmflabs #0], JobOffer[integration-slave-docker-1050 #0], JobOffer[integration-slave-jessie-1002 #0], JobOffer[compiler1001.puppet-diffs.eqiad.wmflabs #0], JobOffer[integration-slave-jessie-1004 #0], JobOffer[integration-slave-docker-1043 #0], JobOffer[integration-slave-docker-1040 #1], JobOffer[integration-slave-docker-1040 #2], JobOffer[integration-slave-docker-1043 #1], JobOffer[webperformance #0], JobOffer[deployment-deploy01 #2], JobOffer[integration-slave-docker-1058 #0], JobOffer[integration-slave-docker-1059 #3], JobOffer[integration-trigger-01 #4], JobOffer[integration-slave-docker-1048 #0], JobOffer[integration-trigger-01 #7], JobOffer[integration-trigger-01 #9], JobOffer[deployment-deploy01 #1], JobOffer[integration-slave-docker-1054 #2], JobOffer[integration-slave-docker-1050 #1], JobOffer[integration-slave-docker-1059 #1], JobOffer[integration-slave-docker-1059 #0], JobOffer[integration-trigger-01 #6], JobOffer[integration-trigger-01 #5], JobOffer[integration-slave-docker-1058 #1], JobOffer[integration-trigger-01 #1], JobOffer[integration-slave-docker-1051 #2], JobOffer[integration-slave-docker-1054 #1], JobOffer[integration-trigger-01 #3], JobOffer[contint1001 #2], JobOffer[contint1001 #0], JobOffer[deployment-deploy01 #0], JobOffer[integration-slave-docker-1048 #1], JobOffer[integration-slave-docker-1051 #0], JobOffer[integration-trigger-01 #0], JobOffer[integration-trigger-01 #2]
]

It has JobOffer instances for deployment-deploy01 #0 to #3 which would be the four executors on that Jenkins instance.

Same message for all three jobs: beta-scap-eqiad, beta-update-databases-eqiad and beta-publish-deb.

hashar added a comment.EditedThu, Jul 11, 9:47 AM
core/src/main/java/hudson/model/Queue.java
List<JobOffer> candidates = new ArrayList<>(parked.size());
List<CauseOfBlockage> reasons = new ArrayList<>(parked.size());
for (JobOffer j : parked.values()) {
    CauseOfBlockage reason = j.getCauseOfBlockage(p);
    if (reason == null) {
        LOGGER.log(Level.FINEST,
                "{0} is a potential candidate for task {1}",
                new Object[]{j, taskDisplayName});
        candidates.add(j);
    } else {
        LOGGER.log(Level.FINEST, "{0} rejected {1}: {2}", new Object[] {j, taskDisplayName, reason});
        reasons.add(reason);
    }
}

MappingWorksheet ws = new MappingWorksheet(p, candidates);
Mapping m = loadBalancer.map(p.task, ws);
if (m == null) {
    // if we couldn't find the executor that fits,
    // just leave it in the buildables list and
    // check if we can execute other projects
    LOGGER.log(Level.FINER, "Failed to map {0} to executors. candidates={1} parked={2}",
            new Object[]{p, candidates, parked.values()});
    p.transientCausesOfBlockage = reasons.isEmpty() ? null : reasons;
    continue;
}

Mentioned in SAL (#wikimedia-releng) [2019-07-11T09:48:47Z] <hashar> jenkins: add more log details to hudson.model.Queue (FINER > FINEST) https://integration.wikimedia.org/ci/log/Jenkins%20Queue/configure # T72597

The Jenkins logger ( https://integration.wikimedia.org/ci/log/Jenkins%20Queue/ ) missed the FINEST log level. The reasons are the same as shown in the web gui, for example:

JobOffer[deployment-deploy01 #0] rejected beta-update-databases-eqiad: Waiting for next available executor on ‘deployment-deploy01’

The Gearman plugin does implement a canTake method:

src/main/java/hudson/plugins/gearman/NodeAvailabilityMonitor.java
public boolean canTake(Queue.BuildableItem item)
{
    // Jenkins calls this from within the scheduler maintenance
    // function (while owning the queue monitor).  If we are
    // locked, only allow the build we are expecting to run.
    logger.debug("AvailabilityMonitor canTake request for " +
                 workerHoldingLock);

    NodeParametersAction param = item.getAction(NodeParametersAction.class);
    if (param != null) {
        logger.debug("AvailabilityMonitor canTake request for UUID " +
                     param.getUuid() + " expecting " + expectedUUID);

        if (expectedUUID == param.getUuid()) {
            return true;
        }
    }
    return (workerHoldingLock == null);
}

And in the log:

Jul 11, 2019 10:01:51 AM FINE hudson.plugins.gearman.NodeAvailabilityMonitor canTake
AvailabilityMonitor canTake request for deployment-deploy01_exec-0
Jul 11, 2019 10:01:51 AM FINE hudson.plugins.gearman.NodeAvailabilityMonitor canTake
AvailabilityMonitor canTake request for deployment-deploy01_exec-0
Jul 11, 2019 10:01:51 AM FINE hudson.plugins.gearman.NodeAvailabilityMonitor canTake
AvailabilityMonitor canTake request for deployment-deploy01_exec-0
Jul 11, 2019 10:01:51 AM FINE hudson.plugins.gearman.NodeAvailabilityMonitor canTake
AvailabilityMonitor canTake request for deployment-deploy01_exec-0
Jul 11, 2019 10:01:51 AM FINE hudson.plugins.gearman.NodeAvailabilityMonitor canTake
AvailabilityMonitor canTake request for deployment-deploy01_exec-0
Jul 11, 2019 10:01:51 AM FINE hudson.plugins.gearman.NodeAvailabilityMonitor canTake
AvailabilityMonitor canTake request for deployment-deploy01_exec-0
Jul 11, 2019 10:01:51 AM FINE hudson.plugins.gearman.NodeAvailabilityMonitor canTake
AvailabilityMonitor canTake request for deployment-deploy01_exec-0
Jul 11, 2019 10:01:51 AM FINE hudson.plugins.gearman.NodeAvailabilityMonitor canTake
AvailabilityMonitor canTake request for deployment-deploy01_exec-0
Jul 11, 2019 10:01:51 AM FINE hudson.plugins.gearman.NodeAvailabilityMonitor canTake
AvailabilityMonitor canTake request for deployment-deploy01_exec-0
Jul 11, 2019 10:01:51 AM FINE hudson.plugins.gearman.NodeAvailabilityMonitor canTake
AvailabilityMonitor canTake request for deployment-deploy01_exec-0
Jul 11, 2019 10:01:51 AM FINE hudson.plugins.gearman.NodeAvailabilityMonitor canTake
AvailabilityMonitor canTake request for deployment-deploy01_exec-0
Jul 11, 2019 10:01:51 AM FINE hudson.plugins.gearman.NodeAvailabilityMonitor canTake
AvailabilityMonitor canTake request for deployment-deploy01_exec-0

So it seems blocked because there are no parameters somehow even though there should at least be a uuid/id :-\