The advantages are:
- reduced CI resources as we don't need to wait for a 20-30 minute build if we know it's going to fail anyway
- a little less impact of flaky tests as "recheck" can be started sooner
The advantages are:
Here's an example of Selenium failing early (note I think we'd need to update wdio.conf.js for extensions too?)
Main test build failed. wmf-quibble-core-vendor-mysql-hhvm-docker https://integration.wikimedia.org/ci/job/wmf-quibble-core-vendor-mysql-hhvm-docker/17066/console : FAILURE in 6m 33s mwgate-node10-docker https://integration.wikimedia.org/ci/job/mwgate-node10-docker/4436/console : SUCCESS in 53s mediawiki-quibble-composer-mysql-php70-docker https://integration.wikimedia.org/ci/job/mediawiki-quibble-composer-mysql-php70-docker/20833/console : FAILURE in 3m 24s mediawiki-quibble-vendor-mysql-php70-docker https://integration.wikimedia.org/ci/job/mediawiki-quibble-vendor-mysql-php70-docker/21309/console : FAILURE in 3m 20s mediawiki-quibble-composertest-php70-docker https://integration.wikimedia.org/ci/job/mediawiki-quibble-composertest-php70-docker/20782/console : SUCCESS in 1m 54s mediawiki-core-jsduck-docker https://integration.wikimedia.org/ci/job/mediawiki-core-jsduck-docker/11036/console : SUCCESS in 18s mediawiki-core-php70-phan-docker https://integration.wikimedia.org/ci/job/mediawiki-core-php70-phan-docker/31749/console : SUCCESS in 1m 31s
I would actually object to this: imagine your change has caused multiple test failures that you weren't able to predict in your dev environment (because you didn't have all extensions installed or your environment is otherwise different from our CI). You'll have to amend your PR with one fix at a time and push it just to see what explodes next.
For this use case, what would you think about a "check all" command which forces a full run of all suites, gated extensions, browser tests, etc., not stopping on failure? We have to optimize for one or the other case, it seems.
Another variation would be that we drop the running job's priority after the first failure, but still let it continue to run. That might be difficult technically and not save us many resources, however.
I would actually object to this: imagine your change has caused multiple test failures that you weren't able to predict in your dev environment (because you didn't have all extensions installed or your environment is otherwise different from our CI).
I don’t doubt that this could be a problem and that it would be really annoying, but I’m wondering if you could share a patch or two with this kind of situation so we could look at a specific example?
And yeah I like the idea of a “check all” command to bypass the stop on failure.
Relatedly, IMO the stop on failure work would be much more worthwhile if a failure in one job in the test pipeline could immediately halt all the other jobs too.
From the point of view that has been commited to the patchs, this will limit the ability to run all tests in my development environment. Is it only possible to change the WMF CI configuration to do this?
Yeah, this would make fixing multiple issues very annoying and time-consuming, and would make the life of new developers especially hard (so far we didn't force them set up the tests locally just to have a non-aggravating experience). I don't think it's worth it.
Ideally, the tests would report failure as soon as they find it but continue to run to the end. I doubt that's realistic with Jenkins though. (This post describes a possible way to do it via the cacheResult option, but it would require a much more recent PHPUnit version, and even so it seems very hacky.)
One thing that might be worthwhile is starting with "local" test (core tests in case of core, the extensions' own tests in case of an extension) and only running tests belonging to other repos if the local tests succeed. Since it's almost always the local tests that fail, that would speed things up a lot without forcing too many re-runs on the developer.