We observed today a huge disruption of the distribution of jobs by Jenkins probably due to the Throttle plugin badly interacting with the Gearman plugin. For some reasons, the jobs in the queue seems to hold executors for a given node and that starves the number of executors.
When slaves will have only one executors, that will be essentially be solved since we will be able to remove the Throttle plugin.
A potential troublers is https://integration.wikimedia.org/ci/job/mediawiki-core-doxygen-publish/ mediawiki-core-doxygen-publish which is rather long (10+minutes) and triggers for mediawiki/core patch merges as well as tag. When we do a security release of MediaWiki core it is not unusual to have several patches triggers per release branches. So for REL1_26 and 3 changes A, B, C we end up triggering:
mediawiki-core-doxygen-publish | A |
mediawiki-core-doxygen-publish | B |
mediawiki-core-doxygen-publish | C |
Which takes half an hour, only to result in documentation for C which replaces doc generated for previous changes A and B.
Instead, we should just poll git from time to time and rebuild the doc for branches that have been updated or new tags that have been added. We will want to prevent the Jenkins git plugin from building legacy branches and tags and skip wmf branches/tags.