Page MenuHomePhabricator

Accommodate flaky tests flapping
Closed, ResolvedPublic


The problem:
Some tests (read: this shouldn't be used for all tests to get around other issues) are inherently flaky. Their flakiness causes noise and makes developers stop caring about them. This doesn't help anyone.

The strawman proposal:
Provide a mechanism to indicate that a test is flaky that will tell Jenkins to only complain about a failed test if that test fails eg 3 times in a row.

See also: T67773: Auto retry failed browser tests to reduce false negatives

Event Timeline

greg created this task.Mar 27 2015, 6:45 PM
greg raised the priority of this task from to Normal.
greg updated the task description. (Show Details)
greg added subscribers: greg, Jdlrobson.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMar 27 2015, 6:45 PM

We might be able to handle it with the Jenkins plugin that parse the test result. Since browser tests jobs have a linear history, the plugin can be made to not fail the build if a test has been failing in the previous build, it will then mark it as UNSTABLE instead of FAILING. From there we could disable notifications for unstable jobs.

That would go as:

Build #1
test 1 success
test 2 success
Build #2 (test1 fails)
test 1 failure
test 2 success
Build #3 (test 1 still fails)
test 1 failure
test 2 success
(test 1 was previously failing)
Build #4 (both tests fail)
test 1 failure
test 2 failure
-> build is a FAILURE (test 2 newly failing)

That might improve the feedback.

zeljkofilipin set Security to None.Apr 29 2015, 2:09 PM
zeljkofilipin added a subscriber: dduvall.

I vote -1. Tests should be either made stable or deleted. Can somebody provide an example of an inherently flaky test?

To be clear my absolute biggest pain point right now is beta labs going down mid tests. I waste countless hours going through test failures for MobileFrontend distinguishing between bugs and failures caused by our infrastructure.

To make things worse in the past other people have done exactly the same.

So some easy way to mark a whole build as "flaky"/ or " ignore" would prevent duplication of effort and hopefully highlight how bad this problem is.

greg added a comment.Apr 29 2015, 2:35 PM

I vote -1. Tests should be either made stable or deleted. Can somebody provide an example of an inherently flaky test?

Until we get a real production environment to test against, anything that uses Beta Cluster as an end point :)

Before we can ignore this request outright, we need to look at our test data to see if and where we have flaky tests.

hashar lowered the priority of this task from Normal to Low.Jun 2 2015, 3:03 PM
hashar moved this task from Untriaged to Backlog on the Continuous-Integration-Infrastructure board.

@greg no activity in years, should this be closed?

zeljkofilipin closed this task as Resolved.Jul 26 2017, 11:27 AM
zeljkofilipin claimed this task.

365933 (Add unstable status to browser tests jobs) is as far as this is likely to go for foreseeable future. Please reopen if there is something that needs to be done here. In that case, please be explicit on what needs to be done.