Page MenuHomePhabricator

Only alert when selenium-daily-{sitename}-MediaWiki tests fail on consecutive runs
Closed, ResolvedPublicFeature

Description

Thanks @bd808, I don't spend any time on debugging unless the job fails at least two days in a row.

These Jenkins jobs are currently configured to email two mailing lists and three individuals when the "unstable" (successful build with failed tests) or "failure - any" (failed build) triggers fire. It looks like the better triggers to use if we only care about multiple successive failures would be:

  • "Failure - Still": An email will be sent if the build status is "Failure" for two or more builds in a row.
  • "Unstable (Test Failures) - Still": An email will be sent if the build status is "Unstable" for two or more builds in a row.
  • "Failure -> Unstable (Test Failures)": An email will be sent any time the build goes from failing (compilation or build step failures), to unstable (unit test failures).

Event Timeline

I can't find an obvious mapping in JJB for the "Failure -> Unstable (Test Failures)" status, but still-failing and still-unstable are there.

The selenium-daily-{sitename}-{project} job template is used in jjb/mediawiki.yaml to configure jobs for selenium-daily-beta-MediaWiki and selenium-daily-betacommons-MediaWiki. These are the jobs that send notifications to the betacluster-alerts@lists.wikimedia.org list that I am watching. I would like these notifications to only happen when they fit the "multiple consecutive failures" criteria that @zeljkofilipin cares about.

There are also a larger number of selenium-daily-beta-{project} jobs generated from the same template that send notifications to qa-alerts@lists.wikimedia.org and various individuals based on the extension under test. I am wondering if these as well should only notify on consecutive failures, or if they should be left as is? What do you think @zeljkofilipin, @vaughnwalters, @Etonkovidova, @EAkinloose?

Change #1147076 had a related patch set uploaded (by BryanDavis; author: Bryan Davis):

[integration/config@master] selenium-daily-{sitename}-{project}: allow email notif config

https://gerrit.wikimedia.org/r/1147076

The selenium-daily-{sitename}-{project} job template is used in jjb/mediawiki.yaml to configure jobs for selenium-daily-beta-MediaWiki and selenium-daily-betacommons-MediaWiki. These are the jobs that send notifications to the betacluster-alerts@lists.wikimedia.org list that I am watching. I would like these notifications to only happen when they fit the "multiple consecutive failures" criteria that @zeljkofilipin cares about.

There are also a larger number of selenium-daily-beta-{project} jobs generated from the same template that send notifications to qa-alerts@lists.wikimedia.org and various individuals based on the extension under test. I am wondering if these as well should only notify on consecutive failures, or if they should be left as is? What do you think @zeljkofilipin, @vaughnwalters, @Etonkovidova, @EAkinloose?

My vote is for those to stay as is. But I only get failure alerts from a couple of repos so it's not too much.

My vote is for those to stay as is. But I only get failure alerts from a couple of repos so it's not too much.

My current patch will only change selenium-daily-beta-MediaWiki and selenium-daily-betacommons-MediaWiki. These are the nightly tests that send their results to betacluster-alerts@lists.wikimedia.org. @zeljkofilipin said in T270771#10829124 these are flaky enough that he only pays attention when they have failed multiple days in a row.

I'm happy to leave the selenium-daily-beta-{project} set working just as they have been. There will be the ability to change any/all of them to ignore the first failure in the future if you decide that is helpful.

I would also be happy to update my patch so that it is only the betacluster-alerts@lists.wikimedia.org list that ignores the first failure if that seems more reasonable to folks. I don't want to hide things that folks would normally take action on. I really just want people to feel like the error emails they get are valuable so that they don't start ignoring them out of habit and miss important regressions.

The selenium-daily-{sitename}-{project} job template is used in jjb/mediawiki.yaml to configure jobs for selenium-daily-beta-MediaWiki and selenium-daily-betacommons-MediaWiki. These are the jobs that send notifications to the betacluster-alerts@lists.wikimedia.org list that I am watching. I would like these notifications to only happen when they fit the "multiple consecutive failures" criteria that @zeljkofilipin cares about.

There are also a larger number of selenium-daily-beta-{project} jobs generated from the same template that send notifications to qa-alerts@lists.wikimedia.org and various individuals based on the extension under test. I am wondering if these as well should only notify on consecutive failures, or if they should be left as is? What do you think @zeljkofilipin, @vaughnwalters, @Etonkovidova, @EAkinloose?

I'm ok to be informed only of consecutive failures. Thank you, @bd808!

bd808 renamed this task from Only alert when selenium-daily-{sitename}-{project} tests fail on consecutive runs to Only alert when selenium-daily-{sitename}-MediaWiki tests fail on consecutive runs.May 20 2025, 7:55 PM

I have renamed this task to be about the selenium-daily-beta-MediaWiki and selenium-daily-betacommons-MediaWiki tasks. I have a hunch that we are going to end up without consensus on the selenium-daily-beta-{project} checks because there are a number of interested parties and likely different beliefs & experiences about test flakiness.

We could spin off a follow up task to introduce a slightly more complex configuration for these tests that would accept two different email recipient lists: one for folks who want a notice for every failure, and the other for folks who only want the 2+ consecutive failure emails. That is probably only worthwhile if we have a reasonably even split between the 2 camps.

I have renamed this task to be about the selenium-daily-beta-MediaWiki and selenium-daily-betacommons-MediaWiki tasks. I have a hunch that we are going to end up without consensus on the selenium-daily-beta-{project} checks because there are a number of interested parties and likely different beliefs & experiences about test flakiness.

We could spin off a follow up task to introduce a slightly more complex configuration for these tests that would accept two different email recipient lists: one for folks who want a notice for every failure, and the other for folks who only want the 2+ consecutive failure emails. That is probably only worthwhile if we have a reasonably even split between the 2 camps.

I am fine to get the email only for 2+ consecutive failures - I don't think it's worth adding that split config and I was on the fence on this anyway. Thanks @bd808 !

Change #1147076 merged by jenkins-bot:

[integration/config@master] jjb: Skip first notif for Selenium jobs on beta

https://gerrit.wikimedia.org/r/1147076

Mentioned in SAL (#wikimedia-releng) [2025-05-22T17:25:43Z] <bd808> ./jjb-update 'selenium-daily-beta*-MediaWiki' to deploy updates to selenium-daily-beta-MediaWiki and selenium-daily-betacommons-MediaWiki failure notifications (T394551)

bd808 claimed this task.

https://integration.wikimedia.org/ci/view/selenium-daily/job/selenium-daily-beta-MediaWiki/configure and https://integration.wikimedia.org/ci/view/selenium-daily/job/selenium-daily-betacommons-MediaWiki/configure are showing the expected notif changes. Both jobs will now only email in the "Failure - Still" and "Unstable (Test Failures) - Still" states.

If folks would like to modify other selenium-daily-* jobs similarly that should be a relatively straight forward patch to apply.

Since the selenium-daily-beta-MediaWiki and selenium-daily-betacommons-MediaWiki jobs are still broken and resulting in two daily emails, I propose removing betacluster-alerts@lists.wikimedia.org from the recipients list since the notification is non-actionable.

Change #1176505 had a related patch set uploaded (by Ahmon Dancy; author: Ahmon Dancy):

[integration/config@master] mediawiki.yaml: Remove betacluster-alerts@lists.wikimedia.org from recips list

https://gerrit.wikimedia.org/r/1176505

Change #1176505 merged by jenkins-bot:

[integration/config@master] mediawiki.yaml: Remove betacluster-alerts@lists.wikimedia.org from recips list

https://gerrit.wikimedia.org/r/1176505