Page MenuHomePhabricator

Only receive mails for failing selenium daily tests when test fail in a row
Open, Needs TriagePublic

Description

The tests suite for selenium tests in Two-Column-Edit-Conflict-Merge is quite big and we're getting almost every day reports since at least on test fails regularly. In ~99% of the cases the tests the to fail randomly due to flackyness of the beta cluster. So the age column is almost higher than one in the list of failing test cases. E.g.

Failed Tests
Test Name	Duration	Age
chrome.90_0_4430_85.linux.TwoColConflict.shows_a_dismissible_hint_on_the_core_edit_conflict_interface	1 min 0 sec	1
All Tests
Package	Duration	Fail	Skip	Total
chrome.90_0_4430_85.linux	17 min	1	0	46

It would be nice to have an option to only get an email if the age of failing tests is anywhere over 1.

Note: Most of the discussion below is somewhat the way to get to this conclusion.

Event Timeline

I don't think that' this would be a good idea. Since we're not part of the gate and submit job anymore, it's even more important to have regular tests running so see, if somethings broken.

It's a bit bothersome, that ( apparently due to beta cluster being beta cluster ) tests randomly fail, but the last number in the Age column gives an idea if it's a random failure or not. - If this number is higher than one, it means, that the same test case failed two times in a row. - In the last weeks we never had that situation.

tl;dr: lets keep the daily suite running :-)

  1. This doesn't look random any more. As said, I get these emails almost daily now.
  2. What about the 17 minutes?
  1. This doesn't look random any more. As said, I get these emails almost daily now.

Yes, getting this email means, that at least one of the 47 test cases failed. In the report you get a list of the cases that failed. Only when the number in the age column is greater than one it would be a strong hint, that the test did not fail randomly. Otherwise it could be assumed it is random. Especially since the beta cluster can be flaky sometimes.

Not that it is a good thing to have randomly failing test cases on a regular basis, but there's already a ticket for that. T276082: Check randomly failing selenium daily tests

  1. What about the 17 minutes?

I'm not sure why this is a issue. Sure, once a day we're binding resources to run that test suite, but I still think that it's important to have a regular check if things are well. And we also have this other ticket where we look fixing slow browser tests T282935: Fix slow browser tests (7 minutes?) and re-add them to gate

  1. What about the 17 minutes?

I'm not sure why this is a issue. Sure, once a day we're binding resources to run that test suite, but I still think that it's important to have a regular check if things are well. And we also have this other ticket where we look fixing slow browser tests T282935: Fix slow browser tests (7 minutes?) and re-add them to gate

On thing to add here: The tests are also way slower, because they run on the beta cluster and things take more time there.

Here's a table of historical test failures, https://integration.wikimedia.org/ci/job/selenium-daily-beta-TwoColConflict/611/testReport/history/

As already stated, the results are randomly very flaky.

For comparison, here are the other daily tests: https://integration.wikimedia.org/ci/view/selenium-daily/

Only TwoColConflict and WikibaseLexeme tests take more than a few minutes, so there's certainly room for improvement. I don't have much of an opinion about leaving enabled or not, < 20 minutes once per day isn't a huge overhead relative to the gated tests, for example. But getting a failure email every other day doesn't bring much value, either. It's just a reminder that we own a complex and possibly flaky extension and that we need to stabilize the tests.

I've been working on support for parallel browser testing, we could experiment with that once the other low-hanging fruit is exhausted. Another quick fix would be to skip all but a handful of essential tests, maybe to cover what we consider the riskiest features.

WMDE-Fisch renamed this task from Disable broken selenium-daily-beta-TwoColConflict to Only receive mails for failing selenium daily tests when test fail in a row.May 19 2021, 9:21 AM
WMDE-Fisch updated the task description. (Show Details)
WMDE-Fisch added a subscriber: zeljkofilipin.