Page MenuHomePhabricator

Provide early feedback when a patch has job failures
Open, Needs TriagePublic

Description

Following up from the declined T225871: Selenium and PHPUnit: Stop execution on failure and tangential to T248531: Abort a Zuul pipeline when one job completed with failures (change zuul scheduler's failure check from areAllJobsComplete to didAnyJobFail):

  • have a web app on toolforge, i.e. ci-early-failure-notice.toolforge.org
  • patch Quibble so that when a process ends with exit code other than 0, it POSTs some data about the job (patch details) to the web app
  • the web app adds a comment and a Verified -1 to the patch
    • the comment should include a link to the Jenkins build URL for the failing job
  • [bonus points] Quibble could capture a formatted version of the failing command output, and include this in the comment that the web app leaves in Gerrit. That is related to T209149: Have linters/tests results show up as comments in files on gerrit. We'd need a trimmed version of this for PHPUnit which currently dumps a lot of text.

Why?

  • developers receive pings early via email/IRC that a patch has failures. Yes, you can watch Zuul, but most people don't know to do that, and it doesn't notify you that a job has failed.
  • Early feedback on a failed patch allows the developer to 1) fix the patch before switching their context to other things or 2) in a backport situation, know that they need to restart the jobs

Event Timeline

kostajh added a subscriber: hashar.

@hashar what do you think? I'd be happy to hack on Quibble/web app for this, if you think it's useful.

Ideally Zuul would report immediately when it knows that a change is not going to pass. As you found out that is T248531 which probably should be declined given I don't want to make any change to the legacy forked Zuul code were are using.

you can watch Zuul, but most people don't know to do that

That is indeed a problem. Lot do have the Zuul status page opened and would watch that as CI is progressing.

For a quicker feedback there is T214068 which is to expose in the Gerrit interface the status of the change in Zuul. OpenDev did something like that and showed progress but I never went to integrate it in our interface. I did some work on that front over the last few days, firstly to reformat the Zuul reported messages, then an attempt to integrate the ongoing progress in the UI ( https://gerrit.wikimedia.org/r/859127 ):

zuul_status_in_gerrit.png (498×681 px, 84 KB)

I have two or three changes on that front but none I am pleased with. Eventually Gerrit has an API / UI to expose CI results: https://gerrit.wikimedia.org/r/Documentation/pg-plugin-checks-api.html and I have started porting code to it. The intent is to expose:

  • ongoing processing (by querying the Zuul status page)
  • craft a report (by crawling messages reported by Zuul and other bots)

Zuul 2.5 report in Gerrit Checks UI (865×1 px, 145 KB)

When a job fails and is voting, I guess we can make Gerrit to notify the user via the Web UI. I don't know the JavaScript API needed to do that though but there must be one since Gerrit is able to notify when a new patchset has been uploaded.

Ideally Zuul would report immediately when it knows that a change is not going to pass. As you found out that is T248531 which probably should be declined given I don't want to make any change to the legacy forked Zuul code were are using.

you can watch Zuul, but most people don't know to do that

That is indeed a problem. Lot do have the Zuul status page opened and would watch that as CI is progressing.

For a quicker feedback there is T214068 which is to expose in the Gerrit interface the status of the change in Zuul. OpenDev did something like that and showed progress but I never went to integrate it in our interface. I did some work on that front over the last few days, firstly to reformat the Zuul reported messages, then an attempt to integrate the ongoing progress in the UI ( https://gerrit.wikimedia.org/r/859127 ):

zuul_status_in_gerrit.png (498×681 px, 84 KB)

I have two or three changes on that front but none I am pleased with. Eventually Gerrit has an API / UI to expose CI results: https://gerrit.wikimedia.org/r/Documentation/pg-plugin-checks-api.html and I have started porting code to it. The intent is to expose:

  • ongoing processing (by querying the Zuul status page)
  • craft a report (by crawling messages reported by Zuul and other bots)

Zuul 2.5 report in Gerrit Checks UI (865×1 px, 145 KB)

When a job fails and is voting, I guess we can make Gerrit to notify the user via the Web UI. I don't know the JavaScript API needed to do that though but there must be one since Gerrit is able to notify when a new patchset has been uploaded.

Having thought about this for a bit, I think there's room for both:

  • improved UX when viewing a change in Gerrit (the changes you're working on)
  • immediate -1 vote when a Quibble job fails, via a tool that receives the build information and makes a comment in Gerrit. The benefit is triggering IRC/email notifications for quicker feedback. I guess the downside is also that multiple job failures could be considered too spammy...