Zuul coverage pipeline is no more processing mwext-phpunit-coverage-patch jobs
Closed, ResolvedPublic

Description

This morning the Zuul coverage pipeline is filled up with changes all waiting for the job mwext-phpunit-coverage-patch to eventually run.

I enabled a mutex on that job to have only one build of it running at any given time https://gerrit.wikimedia.org/r/#/c/419674/

The CirrusSearch change https://gerrit.wikimedia.org/r/#/c/419804/ had a PS2 trigger the coverage job, immediately followed by a PS3. The job for PS2 got canceled by Zuul as intended, but the mutex did not get released. Hence the mutex is hold and no build of that job can run anymore.

The reason is our Zuul version does not release the mutex on build cancellation. Luckily Tobias Henkel has backported a fix to have zuul release the mutex on job cancellation:

Zuul v2 https://review.openstack.org/#/c/384980/
Zuul v3 https://review.openstack.org/#/c/432211/

Upstream: https://storyboard.openstack.org/#!/story/2000657

hashar created this task.Mar 16 2018, 10:07 AM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMar 16 2018, 10:07 AM
hashar updated the task description. (Show Details)Mar 16 2018, 10:38 AM
hashar added a project: Upstream.
hashar updated the task description. (Show Details)
hashar moved this task from Backlog to Patch merged upstream on the Upstream board.

Change 420002 had a related patch set uploaded (by Hashar; owner: Hashar):
[integration/zuul@debian/jessie-wikimedia] 2.5.1-wmf4: drop log/fix mutex release on job cancel

https://gerrit.wikimedia.org/r/420002

Mentioned in SAL (#wikimedia-operations) [2018-03-16T10:53:50Z] <hashar> Upgrading zuul to zuul_2.5.1-wmf4 to resolve a mutex deadlock T189859

Mentioned in SAL (#wikimedia-releng) [2018-03-16T10:53:56Z] <hashar> Upgrading zuul to zuul_2.5.1-wmf4 to resolve a mutex deadlock T189859

hashar closed this task as Resolved.Mar 16 2018, 11:15 AM
hashar claimed this task.

Change 420002 merged by jenkins-bot:
[integration/zuul@debian/jessie-wikimedia] 2.5.1-wmf4: drop log/fix mutex release on job cancel

https://gerrit.wikimedia.org/r/420002