From time to time, some subsets of jobs are no more being executed. Zuul does enqueue them properly as can be seen on https://integration.wikimedia.org/zuul/ when the issue occurs.
The Jenkins queue is idling with target hosts not running any tests.
An example of a stuck job is:
$ echo status|nc -q 2 localhost 4730|grep integration-jjb-config-test build:integration-jjb-config-test 2 0 14 build:integration-jjb-config-test:contintLabsSlave 0 0 14 $
Where the numbers are Total, Running, Workers. The status page shows two jobs being stuck.
Another occurrence:
$ echo status|nc -q 2 localhost 4730|grep apps-android-wikipedia-tox-flake8 build:apps-android-wikipedia-tox-flake8 17 0 14 build:apps-android-wikipedia-tox-flake8:contintLabsSlave 0 0 14 $
And there is indeed 17 such jobs being stuck.
Suspicion: both jobs are tied to the node label contintLabsSlave. Either Zuul apparently asked to run the labelless function which got properly enqueued by the Gearman server. Since the job has a label, the labelless function is never being processed by the Jenkins Gearman plugin.
Version: wmf-deployment
Severity: normal
See Also:
https://launchpad.net/bugs/1381565