Page MenuHomePhabricator

Abort a Zuul pipeline when one job completed with failures (change zuul scheduler's failure check from areAllJobsComplete to didAnyJobFail)
Closed, DeclinedPublic

Description

We definitely need to do better here but the problem imho is not that the job needs to be marked as failure earlier. We are already doing that. The problem is that we are running multiple jobs and wait for all of them to complete.

This task is to explore what we can do to make Zuul give feedback earlier to developers once one job has failed. There are a number of ways this could be addressed:

  • (Basic) Once one job fails, abort the others and submit the comment to Gerrit.
  • (Ideal) Configure in Zuul which jobs are fast/mandatory and which are allowed to be aborted if others have already found failures. For example, if a PHPUnit or QUnit test is failing, it might be due to a syntax error. When that is the case, it is likely that the PHPCS (mediawiki-…-composer-test) and ESLint (mwgate-node) jobs have information that better explains this error. In general, it is likely that these will finish first anyway because they are much quicker. But it would be ideal to enforce this so that we are never in a situation where a developer is unable to get PHPCS or ESLint feedback.
  • (Amazing?)
    • Somehow edit the comment after initial submission so that we can keep adding details as we go.
    • Use some kind of "Commit Status API" (like GitHub). Gerrit upstream is working on this and this would integrate the results of individual jobs much deeper in the Gerrit interface without any need for "bot comments", and without having to expose the details of the "pipeline" to Gerrit users.

Event Timeline

From T225730: Reduce runtime of MW shared gate Jenkins jobs to 5 min, T225730#5490156 and T225730#6001349:

I cant tell what kind of madness might occur if we changed the conditional from areAllJobsComplete to didAnyJobFail :-\

Picking this back up again (from T225730#5490156), is it worth a short experiment (24 hours?)? Potentially this would free up a lot of resources in addition to providing quicker feedback.

Jdforrester-WMF renamed this task from Abort a Zuul pipeline when one job completed with failures. to Abort a Zuul pipeline when one job completed with failures (change zuul scheduler's failure check from areAllJobsComplete to didAnyJobFail).Apr 2 2020, 7:54 PM
hashar added a project: Upstream.