Jenkins was slow to restart (bug 47120) and got upgraded. Since them Zuul has been very slow to trigger the build.
After investigating with upstream (James E. Blair from OpenStack), it turns out Zuul wait for the results of JSON API calls which are meant to verify whether a job exist before triggering it.
We have found a few API calls that took 2 to 4 minutes which would lock Zuul internal scheduler. In the end, it would be spending most of its time waiting.
This is a major issue.
A workaround is to skip the API call. James advised to fake the ExtendedJenkins class in Zuul to pretends jobs always exist.
https://gerrit.wikimedia.org/r/#/c/62095/
Version: wmf-deployment
Severity: major