The advantages are:
- reduced CI resources as we don't need to wait for a 20-30 minute build if we know it's going to fail anyway
- a little less impact of flaky tests as "recheck" can be started sooner
The advantages are:
|Open||None||T225730 Reduce runtime of MW shared gate Jenkins jobs to 5 min|
|Open||None||T225871 Selenium and PHPUnit: Stop execution on failure|
Here's an example of Selenium failing early (note I think we'd need to update wdio.conf.js for extensions too?)
Main test build failed. wmf-quibble-core-vendor-mysql-hhvm-docker https://integration.wikimedia.org/ci/job/wmf-quibble-core-vendor-mysql-hhvm-docker/17066/console : FAILURE in 6m 33s mwgate-node10-docker https://integration.wikimedia.org/ci/job/mwgate-node10-docker/4436/console : SUCCESS in 53s mediawiki-quibble-composer-mysql-php70-docker https://integration.wikimedia.org/ci/job/mediawiki-quibble-composer-mysql-php70-docker/20833/console : FAILURE in 3m 24s mediawiki-quibble-vendor-mysql-php70-docker https://integration.wikimedia.org/ci/job/mediawiki-quibble-vendor-mysql-php70-docker/21309/console : FAILURE in 3m 20s mediawiki-quibble-composertest-php70-docker https://integration.wikimedia.org/ci/job/mediawiki-quibble-composertest-php70-docker/20782/console : SUCCESS in 1m 54s mediawiki-core-jsduck-docker https://integration.wikimedia.org/ci/job/mediawiki-core-jsduck-docker/11036/console : SUCCESS in 18s mediawiki-core-php70-phan-docker https://integration.wikimedia.org/ci/job/mediawiki-core-php70-phan-docker/31749/console : SUCCESS in 1m 31s
I would actually object to this: imagine your change has caused multiple test failures that you weren't able to predict in your dev environment (because you didn't have all extensions installed or your environment is otherwise different from our CI). You'll have to amend your PR with one fix at a time and push it just to see what explodes next.
For this use case, what would you think about a "check all" command which forces a full run of all suites, gated extensions, browser tests, etc., not stopping on failure? We have to optimize for one or the other case, it seems.
Another variation would be that we drop the running job's priority after the first failure, but still let it continue to run. That might be difficult technically and not save us many resources, however.
I would actually object to this: imagine your change has caused multiple test failures that you weren't able to predict in your dev environment (because you didn't have all extensions installed or your environment is otherwise different from our CI).
I don’t doubt that this could be a problem and that it would be really annoying, but I’m wondering if you could share a patch or two with this kind of situation so we could look at a specific example?
And yeah I like the idea of a “check all” command to bypass the stop on failure.
Relatedly, IMO the stop on failure work would be much more worthwhile if a failure in one job in the test pipeline could immediately halt all the other jobs too.
From the point of view that has been commited to the patchs, this will limit the ability to run all tests in my development environment. Is it only possible to change the WMF CI configuration to do this?
Yeah, this would make fixing multiple issues very annoying and time-consuming, and would make the life of new developers especially hard (so far we didn't force them set up the tests locally just to have a non-aggravating experience). I don't think it's worth it.
Ideally, the tests would report failure as soon as they find it but continue to run to the end. I doubt that's realistic with Jenkins though. (This post describes a possible way to do it via the cacheResult option, but it would require a much more recent PHPUnit version, and even so it seems very hacky.)
One thing that might be worthwhile is starting with "local" test (core tests in case of core, the extensions' own tests in case of an extension) and only running tests belonging to other repos if the local tests succeed. Since it's almost always the local tests that fail, that would speed things up a lot without forcing too many re-runs on the developer.
I think with regards to how Quibble executes the various phpunit and wdio tasks, I think we're in a fairly good balance between "all" and "until first failure". Each of these steps should only take a few minutes at most, so it's not holding up results very much to let that sub task finish.
We already have Quibble abort the job as soon as one finished sub task has failures. Eg. it won't start wdio tests if phpunit--unit found errors. This means the job is already terminating earlier e.g. within 2-3 minutes instead of 5-10 minutes in case of such failure. I don't think it's that important to get results even sooner than this given the downside (mentioned by others) of then having a very partial result to work with and potentially having to bounce back and forth many times with CI to get to the bottom of it. The current compromise means you'll have to bounce at most 3 times (lint, unit, integration - more or less), which seems fairly good.
There is however a different reason for why we currently have to wait 20 minutes to get a known-failed result back to the user in Gerrit. It's Zuul. Zuul is currently configured to wait for all jobs in the pipeline before reporting back. It has no early abort. This means on patch for mediawik where we run 3 quibble jobs (vendor+plain, vendor+extensions, composer), if one has failed on the lining stage and the job was terminated, it still waits for the others to finish. Not only that, if CI is short on resources and the other haven't even started yet, and may end up waiting 20+ minutes for them to start, finish, and then report back.
Thoughts? Is the current compromise acceptable? Do we want Zuul to abort peer jobs if one has failed?
Instead of patching Zuul for the abort-peer-jobs-if-one-failed functionality (see @hashar's note in T225730#5490156), maybe we could do something with build triggers -- define a linter job for core & extensions that does phpcs and eslint, then the success of that job triggers the full ~20 minute set of jobs.