Page MenuHomePhabricator

Better prevent changes in MW core from breaking WMF production extensions
Open, Needs TriagePublic

Description

The problem

Currently, CI does not always prevent changes in MW core from breaking something in an extension, or changes in one extension from breaking something in another extension. This is because our existing mechanisms for cross-repo testing don't always cover these cases:

  • The "shared gate" job runs for "gated" repos. Whenever a change is proposed or +2ed in a gated repo, this job runs the unit tests for all other gated repos.
  • The regular CI job that runs for all repos only runs the tests for explicitly declared dependencies of that repo.

The set of gated repos currently includes MW core, 51 extensions and 2 skins. Because of the shared gate job, it's impossible for a change in any of these repos to break any of the others, and in particular it's impossible for a change in MW core to break one of the gated extensions/skins' PHPUnit or QUnit tests.

However, there are an additional 142 extensions and 6 skins that are used in WMF production but are not in the shared gate. The CI jobs for MW core do not run the tests for these repos, so it's possible for a change in MW core to break one of these repos without CI noticing(*). This happens with some frequency, see e.g T404229: ReadingLists test failures, T375543: CI failing for ReadingLists extension, T400549: CI failure: LogicException: Instances of MediaWiki\HookContainer\HookContainer are not serializable!, T376000: ArticlePlaceholder test fails in CI on empty GrowthExperiments change on master. In fact, there's a whole Phab project dedicated to these problems: ci-test-error (WMF-deployed Build Failure) (**), where 103 tasks were filed and resolved in the past year (though not all of these were caused by this issue). In addition, despite the shared gate job, it's also possible for a change in MW core to break phan in extensions, e.g. T401468: Phan failures in Wikibase after Status and StatusValue documented as generic (templated) classes. All of these things cause confusion among engineers working on the now-broken extensions, and even get in the way of fixing UBNs from time to time.

(*) It's also theoretically possible for a change in one extension to break another, but that's less common, and in cases where that's likely to happen in practice, a dependency relationship is usually already in place. This task focuses on the more common problem of MW core changes breaking deployed extensions/skins.

(**) Most of the open tasks in this project are about flaky Selenium tests. To see more examples of cross-repo unit test or Phan failures, look at resolved tasks in this project.

Why this is hard

Previously, the Release-Engineering-Team tried to fix this problem by adding all production extensions+skins to the shared gate (T249674: Have all Wikimedia production extensions and skins in the CI gate), but partway through that it became clear that that was not sustainable. Right now only about 1/4th of the production extensions+skins are gated, and already the shared gate jobs are pretty slow: ~16 minutes for PHPUnit+QUnit, and 23 minutes for Selenium (these two run in parallel). Adding even more repos to the gate would further slow these jobs, and also make them more fragile. We could try to invent different tools than the current concept of a shared gate, but they would still have to contend with the fact that running tests on 201 repos would be slower and more fragile than running them on the current 53.

We also don't actually have all of these extension installed on every production wiki. There are pairs of extensions that are never installed together, and there are extensions (like CentralAuth) that, when installed, make it impossible to test certain parts of MW core that we do run in production. So just installing all the extensions and running the tests is not necessarily a silver bullet, or even a good idea.

In the meantime, the recent discussions on T403560: ReadingLists tests not run in jenkins wmf-quibble-core-vendor-mysql-php81 and T404058: Add MetricsPlatform to gated extensions have exposed that there's no clear policy for which repos should be gated, and a number of repos that are currently in the gate probably wouldn't be let in if they requested it today. While it may make sense to decline those individual tasks in the current context, they're symptoms of a broader problem that this task attempts to describe.

Event Timeline

Some ideas that may or may not be helpful in trying to think about a better (or less bad) approach to this problem:

  • Run the tests against all deployed extensions as a nightly job, rather than on every change. This wouldn't provide immediate feedback, but we'd still find problems relatively quickly. This wouldn't have to be a single mega-job that runs all tests together either, but could instead run a regular CI job for each extension. Set a norm that MW core changes that break the nightly are reverted. (For bonus points: use git blame's automation features to have the job identify the change that broke it itself)
  • Make "test this change against all other deployed repos" an opt-in thing, invoked through a Gerrit comment like check deployed. That would be legitimately helpful for testing risky changes that the author/reviewer knows are risky (because testing locally against all other repos is hard), which is probably about 50% of the changes that break things. But it wouldn't help with the other 50% where nobody realized that the change could break another repo.
  • Make running the tests against all deployed repos faster by parallelizing them into 3-5 jobs that each install all extensions, but only run the tests for a subset of them. This could help with speed, but would not help with resource consumption
  • Don't run the Selenium tests cross-repo, only run the unit tests. This would help with speed (the shared gate Selenium job is slower than the shared gate unit test job) and reliability (most reports of flakiness are in the Selenium tests rather than the unit tests), but it would reduce our ability to find problems.
  • Analyze how much longer it would actually take to run the tests for most of the non-gated extensions, to quantify how much slower the gate would become.
  • Analyze how much each of the current gated extensions contributes to the running time of the gate, to quantify how much of it "death by a thousand cuts" vs a few slow extensions/tests. Then consider skipping slow tests (only in the gate, not when running CI on the extension itself).
  • Add a shared gate job for phan, to prevent MW core changes from breaking phan for gated extensions at least
  • Add a shared gate job that runs the unit tests for gated repos + CentralAuth (in addition to the existing job without CentralAuth). This would solve the "CentralAuth overrides too much MW core functionality" problem, but it would also double resource consumption.