Page MenuHomePhabricator

Capture output from failed command and transmit to earlywarningbot
Closed, ResolvedPublic

Description

Reporting that a job failed early is nice, but it would be even better if we could provide a link to the error output for the command, and nothing else.

e.g. compare the output of https://integration.wikimedia.org/ci/job/wmf-quibble-vendor-mysql-php74-docker/15523/consoleFull with having a page where you could just see:

19:54:33 There was 1 failure:
19:54:33 
19:54:33 1) GrowthExperiments\Tests\GrowthExperimentsMultiConfigTest::testGetWithFlagsFromWiki
19:54:33 Failed asserting that 0 is identical to 1.
19:54:33 
19:54:33 /workspace/src/extensions/GrowthExperiments/tests/phpunit/unit/Config/GrowthExperimentsMultiConfigTest.php:103
19:54:33 /workspace/src/tests/phpunit/MediaWikiUnitTestCase.php:115
19:54:33 phpvfscomposer:///workspace/src/vendor/phpunit/phpunit/phpunit:97
19:54:33 
19:54:33 FAILURES!
19:54:33 Tests: 19298, Assertions: 172188, Failures: 1, Skipped: 14.

earlywarningbot currently posts a comment in the format of:

Failed command: "composer phpunit:unit -- --exclude-group Broken,ParserFuzz,Stub"
Phase: "Run Post-dependency install, pre-database dependent steps in parallel (concurrency=3)
Details: https://integration.wikimedia.org/ci/job/wmf-quibble-vendor-mysql-php74-docker/15523/consoleFull
Note: If the build failed due to a non-deterministic test, please manually remove the "Verified: -1" vote after submitting a "recheck" comment.

What I'd like to change it to:

Failed command: "composer phpunit:unit -- --exclude-group Broken,ParserFuzz,Stub"
Output from failed command: https://earlywarningbot.toolforge.org/build/{somehash}
Full details: https://integration.wikimedia.org/ci/job/wmf-quibble-vendor-mysql-php74-docker/15523/consoleFull
Note: If the build failed due to a non-deterministic test, please manually remove the "Verified: -1" vote after submitting a "recheck" comment.

Then developers could save a bit more time by viewing the URL from earlywarningbot which has just the stderr/stdout from the failed command.

In order to do that, we need to use subprocess.run(); in Python 3.7 it can be invoked with check=True and capture_output=True. We'd need to replace our mix of subprocess.check_call/Popen calls and drop support for Python 3.5 and 3.6. (Or maybe there is some other way I am not realizing.)

If we can get the output into the CalledProcessError that is used when calling transmit_error(), then we could add a new field like stderr and then earlywarningbot can store this in a database on Toolforge, and allow users to view its value.

Event Timeline

kostajh updated the task description. (Show Details)

Thinking about it some more, there's another way that doesn't involve any changes to Quibble. The bot can get consoleText for the build e.g. https://integration.wikimedia.org/ci/job/wmf-quibble-vendor-mysql-php74-docker/15523/consoleText and grep for the failed stage and use the start/finish markers in the text (e.g. <<< Finish: Run Post-dependency install, pre-database dependent steps in parallel (concurrency=3)) to grab all text in between.

That has the downside of getting all output from parallelized commands, though. Ideally we'd find a way to report just e.g. PHPUnit unit tests as the phase parameter so we could grep for >>> Start: PHPUnit unit tests, instead of Run Post-dependency install, pre-database dependent steps in parallel (concurrency=3).

I would like to investigate having Gerrit to fetch the output from the Jenkins build and insert the result to the Check tab. From the Check API I think that is CheckResult.message field:

/**
 * Exhaustive optional message describing the check result.
 * Will be initially collapsed. Might potentially be very long, e.g. a log of
 * MB size. The UI is not limiting this. Data providing plugins are
 * responsible for not killing the browser. :-)
 *
 * For now this is just a plain unformatted string. The only formatting
 * applied is the one that Gerrit also applies to human comments. TBD: Both
 * human comments and check result messages should get richer formatting
 * options.
 */
message?: string;

Which would require Quibble to generate a json file containing the details of the failure, we could then insert it in the UI. That might be more robust than using grep.

Change 893994 had a related patch set uploaded (by Kosta Harlan; author: Kosta Harlan):

[integration/quibble@master] parallel: Include output in CalledProcessError

https://gerrit.wikimedia.org/r/893994

I would like to investigate having Gerrit to fetch the output from the Jenkins build and insert the result to the Check tab. From the Check API I think that is CheckResult.message field:

/**
 * Exhaustive optional message describing the check result.
 * Will be initially collapsed. Might potentially be very long, e.g. a log of
 * MB size. The UI is not limiting this. Data providing plugins are
 * responsible for not killing the browser. :-)
 *
 * For now this is just a plain unformatted string. The only formatting
 * applied is the one that Gerrit also applies to human comments. TBD: Both
 * human comments and check result messages should get richer formatting
 * options.
 */
message?: string;

Which would require Quibble to generate a json file containing the details of the failure, we could then insert it in the UI. That might be more robust than using grep.

Yeah, agreed. I am backing away from the grep idea.

Change 893994 had a related patch set uploaded (by Kosta Harlan; author: Kosta Harlan):

[integration/quibble@master] parallel: Include output in CalledProcessError

https://gerrit.wikimedia.org/r/893994

Side note: we have a bunch of places where Quibble raises an Exception e.g. if the installer fails; this is not transmitted to the --reporting-url. That is probably OK?

Well that can still be explored :-]

Change 893994 merged by jenkins-bot:

[integration/quibble@master] parallel: Include output in CalledProcessError

https://gerrit.wikimedia.org/r/893994

Change 894535 had a related patch set uploaded (by Kosta Harlan; author: Kosta Harlan):

[integration/quibble@master] release: Quibble 1.5.2

https://gerrit.wikimedia.org/r/894535

Change 894535 merged by jenkins-bot:

[integration/quibble@master] release: Quibble 1.5.2

https://gerrit.wikimedia.org/r/894535

Change 895854 had a related patch set uploaded (by Kosta Harlan; author: Kosta Harlan):

[integration/quibble@master] commands: Replace subprocess.check_call with subprocess.run

https://gerrit.wikimedia.org/r/895854

Change 896061 had a related patch set uploaded (by Kosta Harlan; author: Kosta Harlan):

[integration/quibble@master] release: Quibble 1.5.3

https://gerrit.wikimedia.org/r/896061

Change 895854 merged by jenkins-bot:

[integration/quibble@master] commands: wrap subprocess in order to capture output

https://gerrit.wikimedia.org/r/895854

Change 896061 merged by jenkins-bot:

[integration/quibble@master] release: Quibble 1.5.3

https://gerrit.wikimedia.org/r/896061

Change 898805 had a related patch set uploaded (by Hashar; author: Hashar):

[integration/config@master] dockerfiles: update to Quibble 1.5.3

https://gerrit.wikimedia.org/r/898805

Change 898807 had a related patch set uploaded (by Hashar; author: Hashar):

[integration/config@master] jjb: update Quibble jobs to 1.5.3

https://gerrit.wikimedia.org/r/898807

Change 898805 merged by jenkins-bot:

[integration/config@master] dockerfiles: update to Quibble 1.5.3

https://gerrit.wikimedia.org/r/898805

Change 898807 merged by jenkins-bot:

[integration/config@master] jjb: update Quibble jobs to 1.5.3

https://gerrit.wikimedia.org/r/898807