We have noticed yesterday a deadlock in Zuul. When a change is in the gate-and-submit but get force merged by someone in Gerrit, Zuul sometime ends up sleeping up to 5 minutes. Meanwhile all changes behind in the queues are obviously blocked.
What is ever more annoying is that Zuul consider the change hasn't entered the repository and thus consider it to be a failure. It thus retrigger the whole queue. So given six patches +2ed and force merged, it takes 6 * 5 = 30 minutes for Zuul to resume to normal operations.
A stacktrace:
Thread: 140156753389312 File "/usr/lib/python2.7/threading.py", line 524, in __bootstrap self.__bootstrap_inner() File "/usr/lib/python2.7/threading.py", line 551, in __bootstrap_inner self.run() File "zuul/scheduler.py", line 784, in run while pipeline.manager.processQueue(): File "zuul/scheduler.py", line 1358, in processQueue item, nnfi, ready_ahead) File "zuul/scheduler.py", line 1330, in _processOneItem self.reportItem(item) File "zuul/scheduler.py", line 1421, in reportItem item.change.branch) File "zuul/trigger/gerrit.py", line 224, in isMerged if self.waitForRefSha(change.project, ref, change._ref_sha): File "zuul/trigger/gerrit.py", line 203, in waitForRefSha time.sleep(self.replication_retry_interval)
Example changes: 198885,1 and 198950,1
The questions are: why cant it find the sha1 commit since the change has actually been merged? Maybe it is missing a git remote update.
That also prevent the other queues from being run :(