Page MenuHomePhabricator

Provide early feedback when a patch has job failures
Closed, ResolvedPublic

Description

Following up from the declined T225871: Selenium and PHPUnit: Stop execution on failure and tangential to T248531: Abort a Zuul pipeline when one job completed with failures (change zuul scheduler's failure check from areAllJobsComplete to didAnyJobFail):

Why?

  • developers receive pings early via email/IRC that a patch has failures. Yes, you can watch Zuul, but most people don't know to do that, and it doesn't notify you that a job has failed.
  • Early feedback on a failed patch allows the developer to 1) fix the patch before switching their context to other things or 2) in a backport situation, know that they need to restart the jobs

Details

Show related patches Customize query in gerrit

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

@hashar what do you think? I'd be happy to hack on Quibble/web app for this, if you think it's useful.

Ideally Zuul would report immediately when it knows that a change is not going to pass. As you found out that is T248531 which probably should be declined given I don't want to make any change to the legacy forked Zuul code were are using.

you can watch Zuul, but most people don't know to do that

That is indeed a problem. Lot do have the Zuul status page opened and would watch that as CI is progressing.

For a quicker feedback there is T214068 which is to expose in the Gerrit interface the status of the change in Zuul. OpenDev did something like that and showed progress but I never went to integrate it in our interface. I did some work on that front over the last few days, firstly to reformat the Zuul reported messages, then an attempt to integrate the ongoing progress in the UI ( https://gerrit.wikimedia.org/r/859127 ):

zuul_status_in_gerrit.png (498×681 px, 84 KB)

I have two or three changes on that front but none I am pleased with. Eventually Gerrit has an API / UI to expose CI results: https://gerrit.wikimedia.org/r/Documentation/pg-plugin-checks-api.html and I have started porting code to it. The intent is to expose:

  • ongoing processing (by querying the Zuul status page)
  • craft a report (by crawling messages reported by Zuul and other bots)

Zuul 2.5 report in Gerrit Checks UI (865×1 px, 145 KB)

When a job fails and is voting, I guess we can make Gerrit to notify the user via the Web UI. I don't know the JavaScript API needed to do that though but there must be one since Gerrit is able to notify when a new patchset has been uploaded.

Ideally Zuul would report immediately when it knows that a change is not going to pass. As you found out that is T248531 which probably should be declined given I don't want to make any change to the legacy forked Zuul code were are using.

you can watch Zuul, but most people don't know to do that

That is indeed a problem. Lot do have the Zuul status page opened and would watch that as CI is progressing.

For a quicker feedback there is T214068 which is to expose in the Gerrit interface the status of the change in Zuul. OpenDev did something like that and showed progress but I never went to integrate it in our interface. I did some work on that front over the last few days, firstly to reformat the Zuul reported messages, then an attempt to integrate the ongoing progress in the UI ( https://gerrit.wikimedia.org/r/859127 ):

zuul_status_in_gerrit.png (498×681 px, 84 KB)

I have two or three changes on that front but none I am pleased with. Eventually Gerrit has an API / UI to expose CI results: https://gerrit.wikimedia.org/r/Documentation/pg-plugin-checks-api.html and I have started porting code to it. The intent is to expose:

  • ongoing processing (by querying the Zuul status page)
  • craft a report (by crawling messages reported by Zuul and other bots)

Zuul 2.5 report in Gerrit Checks UI (865×1 px, 145 KB)

When a job fails and is voting, I guess we can make Gerrit to notify the user via the Web UI. I don't know the JavaScript API needed to do that though but there must be one since Gerrit is able to notify when a new patchset has been uploaded.

Having thought about this for a bit, I think there's room for both:

  • improved UX when viewing a change in Gerrit (the changes you're working on)
  • immediate -1 vote when a Quibble job fails, via a tool that receives the build information and makes a comment in Gerrit. The benefit is triggering IRC/email notifications for quicker feedback. I guess the downside is also that multiple job failures could be considered too spammy...

Change 866376 had a related patch set uploaded (by Kosta Harlan; author: Kosta Harlan):

[integration/quibble@master] [WIP] Send notice of build failures to external reporter

https://gerrit.wikimedia.org/r/866376

Change 866376 merged by jenkins-bot:

[integration/quibble@master] Allow sending build failure data to external endpoint

https://gerrit.wikimedia.org/r/866376

Change 890809 had a related patch set uploaded (by Kosta Harlan; author: Kosta Harlan):

[integration/quibble@master] release: Quibble 1.5.0

https://gerrit.wikimedia.org/r/890809

Change 890809 merged by jenkins-bot:

[integration/quibble@master] release: Quibble 1.5.0

https://gerrit.wikimedia.org/r/890809

Change 890817 had a related patch set uploaded (by Hashar; author: Hashar):

[integration/config@master] dockerfiles: update to Quibble 1.5.0

https://gerrit.wikimedia.org/r/890817

Change 890817 merged by jenkins-bot:

[integration/config@master] dockerfiles: update to Quibble 1.5.0

https://gerrit.wikimedia.org/r/890817

Next step is to create QUIBBLE_API_KEY for use with quibble jobs (similar to SONAR_API_KEY) and I can add it to the config used by the application on https://earlywarningbot.toolforge.org. After that, projects tested by Quibble can use a quibble.yaml file to allow for comments and/or verification -1 votes as soon as a single job fails in a build.

We can store the key in Jenkins credentials store then inject it as QUIBBLE_API_KEY in the Quibble jobs using the Credentials Binding plugin. We have done something similar for COMPOSER_GITHUB_OAUTHTOKEN. In jjb that looks like:

- job:
    wrappers:
      - credentials-binding:
          - text:
              credential-id: composer-github-oauthtoken
              variable: COMPOSER_GITHUB_OAUTHTOKEN

I have added the API key to the Jenkins credential store with the id quibble-earlywarningbot-api-key (private link https://integration.wikimedia.org/ci/manage/credentials/store/system/domain/_/credential/quibble-earlywarningbot-api-key/ ).

Change 891276 had a related patch set uploaded (by Kosta Harlan; author: Kosta Harlan):

[integration/quibble@master] reporting: Add documentation for quibble.yaml

https://gerrit.wikimedia.org/r/891276

Change 891278 had a related patch set uploaded (by Hashar; author: Hashar):

[integration/config@master] jjb: add QUIBBLE_API_KEY to Quibble jobs

https://gerrit.wikimedia.org/r/891278

Mentioned in SAL (#wikimedia-releng) [2023-02-22T14:19:27Z] <hashar> Updating all jobs to add QUIBBLE_API_KEY to the Quibble jobs | https://gerrit.wikimedia.org/r/c/integration/config/+/891278/ | T323750

Change 891278 merged by jenkins-bot:

[integration/config@master] jjb: add QUIBBLE_API_KEY to Quibble jobs

https://gerrit.wikimedia.org/r/891278

Change 891514 had a related patch set uploaded (by Kosta Harlan; author: Kosta Harlan):

[integration/config@master] jjb: Enable reporting-url for quibble-{packages-source}-{database}-{php}-docker

https://gerrit.wikimedia.org/r/891514

Change 891514 merged by jenkins-bot:

[integration/config@master] jjb: Enable reporting-url for quibble-{packages-source}-{database}-{php}-docker

https://gerrit.wikimedia.org/r/891514

Change 891548 had a related patch set uploaded (by Hashar; author: Hashar):

[integration/config@master] jjb: switch jobs to Quibble 1.5.0

https://gerrit.wikimedia.org/r/891548

Change 891549 had a related patch set uploaded (by Hashar; author: Kosta Harlan):

[integration/config@master] jjb: Enable reporting-url for quibble-{packages-source}-{database}-{php}-docker

https://gerrit.wikimedia.org/r/891549

Change 891548 merged by jenkins-bot:

[integration/config@master] jjb: switch jobs to Quibble 1.5.0

https://gerrit.wikimedia.org/r/891548

Change 891549 merged by jenkins-bot:

[integration/config@master] jjb: Enable reporting-url for quibble-{packages-source}-{database}-{php}-docker

https://gerrit.wikimedia.org/r/891549

Change 891559 had a related patch set uploaded (by Kosta Harlan; author: Kosta Harlan):

[integration/config@master] jjb: Enable --reporting-url for more Quibble jobs

https://gerrit.wikimedia.org/r/891559

Change 891559 merged by jenkins-bot:

[integration/config@master] jjb: Enable --reporting-url for more Quibble jobs

https://gerrit.wikimedia.org/r/891559

Change 891653 had a related patch set uploaded (by Kosta Harlan; author: Kosta Harlan):

[integration/quibble@master] reporting: Remove slash from build URL

https://gerrit.wikimedia.org/r/891653

Change 891653 merged by jenkins-bot:

[integration/quibble@master] reporting: Remove slash from build URL

https://gerrit.wikimedia.org/r/891653

Change 891276 merged by jenkins-bot:

[integration/quibble@master] reporting: Add documentation for quibble.yaml

https://gerrit.wikimedia.org/r/891276

Change 867531 had a related patch set uploaded (by Kosta Harlan; author: Kosta Harlan):

[mediawiki/extensions/GrowthExperiments@master] build: Add quibble.yaml

https://gerrit.wikimedia.org/r/867531

Change 867531 merged by jenkins-bot:

[mediawiki/extensions/GrowthExperiments@master] build: Add quibble.yaml

https://gerrit.wikimedia.org/r/867531

Change 894216 had a related patch set uploaded (by Kosta Harlan; author: Kosta Harlan):

[integration/quibble@master] reporting: Command name can be a string

https://gerrit.wikimedia.org/r/894216

Change 894535 had a related patch set uploaded (by Kosta Harlan; author: Kosta Harlan):

[integration/quibble@master] release: Quibble 1.5.2

https://gerrit.wikimedia.org/r/894535

Change 894216 merged by jenkins-bot:

[integration/quibble@master] reporting: Command name can be a string

https://gerrit.wikimedia.org/r/894216

Change 894535 merged by jenkins-bot:

[integration/quibble@master] release: Quibble 1.5.2

https://gerrit.wikimedia.org/r/894535

Change 904759 had a related patch set uploaded (by Kosta Harlan; author: Kosta Harlan):

[mediawiki/extensions/Echo@master] build: Add quibble.yaml and enable early warning bot feedback

https://gerrit.wikimedia.org/r/904759

Change 904759 merged by jenkins-bot:

[mediawiki/extensions/Echo@master] build: Add quibble.yaml and enable early warning bot feedback

https://gerrit.wikimedia.org/r/904759

It seems like the early warning bot is not responding to the recheck command: https://gerrit.wikimedia.org/r/c/mediawiki/extensions/GrowthExperiments/+/911912

Yeah, this is noted in https://wikitech.wikimedia.org/wiki/Tool:Early_warning_bot#Known_issues and the bot also leaves a comment to that effect, e.g. in https://gerrit.wikimedia.org/r/c/mediawiki/extensions/GrowthExperiments/+/908359/19#message-68476447ee66633c2f42a8b3405ee53031c68a9e

Failed command: "composer run --timeout=0 phpunit:entrypoint -- --testsuite extensions --group Database --exclude-group Broken,ParserFuzz,Stub,Standalone"
Details: https://integration.wikimedia.org/ci/job/quibble-vendor-mysql-php74-noselenium-docker/96300/consoleFull

Note: If the build failed due to a non-deterministic test, please manually remove the "Verified: -1" vote after submitting a "recheck" comment.

Maybe setting the "Verified" label is not that useful, and we should remove it as an option. A workaround is to set should_vote: 0 (or remove that line entirely) in the extension's quibble.yaml.

I wonder if there's some way to preprocess red/green output control characters?

  GET quickstarttips
    ✔ the copyedit response has the correct shape and parameters substituted (1130ms)
    ✔ loads different messages varying by skin (104ms)
    ✔ should get tips for minerva / wikitext / update without an HTTP error (75ms)
    1) should get tips for minerva / wikitext / expand without an HTTP error
    ✔ should get tips for minerva / wikitext / references without an HTTP error (762ms)
  37 passing (6s)
  1 failing

takes a moment to understand.

I wonder if there's some way to preprocess red/green output control characters?

  GET quickstarttips
    ✔ the copyedit response has the correct shape and parameters substituted (1130ms)
    ✔ loads different messages varying by skin (104ms)
    ✔ should get tips for minerva / wikitext / update without an HTTP error (75ms)
    1) should get tips for minerva / wikitext / expand without an HTTP error
    ✔ should get tips for minerva / wikitext / references without an HTTP error (762ms)
  37 passing (6s)
  1 failing

takes a moment to understand.

@hashar do you have any ideas? I'm not sure what to do about it.

I wonder if there's some way to preprocess red/green output control characters?

  GET quickstarttips
    ✔ the copyedit response has the correct shape and parameters substituted (1130ms)
    ✔ loads different messages varying by skin (104ms)
    ✔ should get tips for minerva / wikitext / update without an HTTP error (75ms)
    1) should get tips for minerva / wikitext / expand without an HTTP error
    ✔ should get tips for minerva / wikitext / references without an HTTP error (762ms)
  37 passing (6s)
  1 failing

takes a moment to understand.

@hashar do you have any ideas? I'm not sure what to do about it.

I'll have a look at https://github.com/TheLarkInn/ansi-regex

Writing a regex shouldn't be hard IMO (we really only care about those two colors), the harder question is how to convey the information to the user? (Unless Gerrit provides some way for bots to write HTML comments.)

Writing a regex shouldn't be hard IMO (we really only care about those two colors), the harder question is how to convey the information to the user? (Unless Gerrit provides some way for bots to write HTML comments.)

We could replace the green/red with ✅ and ❌, which is what the sonarqube bot does.

I don't think we should attempt to parse output intended for humans, it causes troubles (like ansi codes) and is prone to change upon time as the various test utilities slightly change their output. Instead the test commands should output some JUnit / Json blob which should be stable over time and by easy to parse.

Both Karma for QUnit tests and webdriver.io for Selenium tests must have reporters for that. We used to collect PHPUnit tests in JUnit format but we got rid of that cause the formatted tests results were not directly reachable by developers since CI points to the console rather than page of the Jenkins build. They also took a lot of disk space on the Jenkins primary, then I guess we could add some processing after the build has completed to only retain non success tests which will shrink the resulting JUnit/Json file.

Once we get our tests runners to output some Json, we can get Gerrit to fetch the blobs from the Jenkins builds and exposes them via its https://gerrit.wikimedia.org/r/Documentation/pg-plugin-checks-api.html Checks API. The API is further described as TypeScript interfaces at https://gerrit.googlesource.com/gerrit/+/stable-3.5/polygerrit-ui/app/api/checks.ts as CheckResult.CodePointer but it seems to be better suited for linters.

So the idea is to have CI / Jenkins to expose the tests in json, get the Javascript to retrieve it and format it according to the checks.ts interface and Gerrit will happily format them under the Check tab. The thing is a CheckResult merely has a string result and does not have a finer grained way to expose individual tests failures :/

A past exchange with upstream:

https://discord.com/channels/775374026587373568/1019941974842806302/1078245868353892362

Antoine: hi, I have a question about the check API, what is CodePointers for? It is an array which should hint at file/lines having an issue, but the comment at https://gerrit.googlesource.com/gerrit/+/master/polygerrit-ui/app/api/checks.ts#413 states only one pointer is supported. Thus if a check reports errors on multiple files / lines, we would only get one shown? I am not sure how it is related to the "Findings" tab
the code does invoke map on the array, so maybe the comment is misleading

https://discord.com/channels/775374026587373568/1019941974842806302/1078267881982275624

brohlfs:
Only one codePointer is supported atm.
If you provide a codePointer, then the check result is shown inline in the diff: https://imgur.com/a/HeLespU
If a check run reports mulitple errors, then the recommendation is to use multiple check results, and attach one code pointer to each result.
The "Findings" tab is for robot comments. We have tentative plans for retiring robot comments in favor of check results with code pointers.
Let me know if you have more questions. Happy to help!

https://imgur.com/a/HeLespU :

0uFO3CTh.jpeg (643×939 px, 78 KB)

I have found an example on Chromium https://chromium-review.googlesource.com/c/chromium/src/+/4513944?tab=checks though I don't know they render it with some nice HTML since the message is supposedly a raw text interpreted as Gerrit markdown subset :(

gerrit_chromium_test_failure.png (684×1 px, 156 KB)

Change 931995 had a related patch set uploaded (by Kosta Harlan; author: Kosta Harlan):

[mediawiki/extensions/CheckUser@master] build: Add quibble.yaml and enable early warning bot feedback

https://gerrit.wikimedia.org/r/931995

Change 931995 merged by jenkins-bot:

[mediawiki/extensions/CheckUser@master] build: Add quibble.yaml and enable early warning bot feedback

https://gerrit.wikimedia.org/r/931995

Change 963309 had a related patch set uploaded (by Kosta Harlan; author: Kosta Harlan):

[mediawiki/extensions/ReportIncident@master] build: Add quibble.yaml and enable early warning bot feedback

https://gerrit.wikimedia.org/r/963309

Change 963309 merged by jenkins-bot:

[mediawiki/extensions/ReportIncident@master] build: Add quibble.yaml and enable early warning bot feedback

https://gerrit.wikimedia.org/r/963309

Change 965391 had a related patch set uploaded (by Kosta Harlan; author: Kosta Harlan):

[mediawiki/extensions/MediaModeration@master] build: Add quibble.yaml and enable early warning bot feedback

https://gerrit.wikimedia.org/r/965391

Change 965391 merged by jenkins-bot:

[mediawiki/extensions/MediaModeration@master] build: Add quibble.yaml and enable early warning bot feedback

https://gerrit.wikimedia.org/r/965391

Change 971120 had a related patch set uploaded (by Kosta Harlan; author: Kosta Harlan):

[mediawiki/core@master] build: Add quibble.yaml and enable Early Warning Bot feedback

https://gerrit.wikimedia.org/r/971120