Page MenuHomePhabricator

Regularly run mwext-{name}-testextension-* jobs to make sure they are still passing after core or dependency changes
Open, LowPublic

Description

In #mediawiki-i18n Nemo said that l10n-bot should not be a first line regression tester now that jenkins is checking l10n-bot commits.

hashar said it is possible for zuul to trigger jobs daily, except it has no idea who to report failures to. So as a first step I propose that we run these jobs daily, and create a dashboard of some sort that lists whether extensions are passing or failing. If it is useful, we can configure notifications for those who need it, either auto-filing bug reports, email, whatever.

Event Timeline

Legoktm raised the priority of this task from to Needs Triage.
Legoktm updated the task description. (Show Details)
Legoktm added subscribers: Legoktm, hashar, Nemo_bis.

Extensions should encounter these naturally if they're being maintained. As a last resort, the commits creating the release branches can be used as a last defense to ensure extensions keep up at least once per release cycle.

Also note that many extensions deployed at Wikimedia are part of the combo-group mediawiki-extensions-hhvm, which is triggered by MediaWiki core after every commit.

Krinkle set Security to None.
greg subscribed.

This may be addressed in the future by the very preliminary project code-named quibble.

Change 376739 had a related patch set uploaded (by Hashar; owner: Hashar):
[operations/puppet@production] zuul: allow email connection

https://gerrit.wikimedia.org/r/376739

Change 376740 had a related patch set uploaded (by Hashar; owner: Hashar):
[integration/config@master] zuul: periodic pipeline

https://gerrit.wikimedia.org/r/376740

I am pretty sure @Reedy also asked to run the PHPUnit tests for all extensions/skins.

Change 376739 merged by Dzahn:
[operations/puppet@production] zuul: allow email connection

https://gerrit.wikimedia.org/r/376739

From Gerrit #376739, puppet restarted the zuul-merger. Gotta restart Zuul scheduler to have the mail/smtp configuration to be taken in account.

Then it is all about adding the smtp reporter in zuul/layout.yaml and we can start getting emails for timed jobs.

Mentioned in SAL (#wikimedia-operations) [2019-01-09T07:43:52Z] <hashar> contint1001: restarted Zuul to take in account SMTP configuration | https://gerrit.wikimedia.org/r/376739 | T93414

The Zuul scheduler now has a SMTP connection which defaults to send emails to qa-alerts@lists.wikimedia.org. We can thus have a pipeline that emits email, with something like:

zuul/layout.yaml
- name: daily
  failure:
    smtp
  success:
    smtp:
       from: jenkins@example.org
       to: someone@example.org
       subject: Change {change} failed

So is this something that just needs to be configured now?

One of our APIs was broken for a month due to a core change which the tests would have caught, had anyone ran them (T226640: ReadingLists CI broken); I would be interested in setting this up to prevent similar mishaps in the future.

Zuul should be able to send mail reports now. Then we would need a pipeline that is able to trigger on a weekly basis or so which I have proposed at the time with https://gerrit.wikimedia.org/r/#/c/integration/config/+/376740/ .

The devil is that there are a lot of repositories/jobs to run, they would most probably be all enqueued at the same time. We would also need the CI job to be able to understand it has to build the tip of a branch instead of a Gerrit change.

Maybe start as opt-in? The list of extensions which 1) have good test coverage, 2) are used in production, 3) are not tested on core patches can't be that huge...

Alternatively, since these are weekly or daily jobs, just hash the repo name to time-of-day to get an even distribution instead of running everything at once.

Change 376740 abandoned by Hashar:
zuul: periodic pipeline

https://gerrit.wikimedia.org/r/376740

I am pretty sure @Reedy also asked to run the PHPUnit tests for all extensions/skins.

Definitely would've been useful. I've spent hours chasing down failures in MW-1.31-release and MW-1.34-release

Some were expected failures (like needed codesniffer bumps), others were fixed in newer branches and could be easily backported.. Others were somewhat rather large rabbit holes to dive down

It would've been nice if we we kept ontop of these, and if you backport to an extension in a supported branch that the tests etc wouldn't be completely broken nearly every time (especially for tarball and WMF deployed extensions)