2025-11-26 Wikitech-l announcement
Background
Prior to 2015, we had one Jenkins job that tested most MediaWiki patches, regardless of which extension. It would install the "current" extension (based on the repo) and apply the "current" patch. If your extension had any dependencies, you needed to maintain a separate Jenkins job that would alter this.
In 2015, we remove the need for these ad-hoc jobs by providing a way to configure the main job via environment variables. This meant that all you have to do is edit a (small) YAML file that only has one clear and simple purpose: list what (if any) dependencies your extension wants to have installed. Perfect!
This happened in T96690, and https://gerrit.wikimedia.org/r/207132.
While it seems to be forgotten knowledge now, this YAML file is actually interpreted recursively.
In 2015, this recursion was harmless. In theory, this allows for a bit of de-duplication when introducing a new hard dependency. Either it made very little difference, neither a boon nor a bane. Extensions rarely have hard dependencies. And even when they do, it's generally just one or two. That was true in 2015, and that remains true a decade later in 2025.
Status quo
Through our use of Gerrit and especially Zuul, the WMF CI has consistently stayed ahead of the industry norm [1] For example:
- Since 2013 (Gerrit): Test all patches before merging. We enforce testing of all patches before marging, i.e. master always green (mostly!). GitHub did eventually introduce "protected branches", but it remains common for projects and private companies to have a culture of active maintainers committing directly to master, "the build" runs after, and "the build" can be failing. It a patch breaks "the build" it has to be found and reverted, and meanwhile you will be working on top of code that should not have been committed in the first place.
- Since 2013 (Zuul gate): Test at point of merging. A patch may have pass today, but between now and next week when someone reviews it, a lot can change. For example: Changes to code underneath what you are writing, a dependency in core or another extension, changes elseswhere in the same repo, new integration tests, new lint requirements, new structure tests, removing a then-unused dependency, etc. Without the Zuul gate, these would be would merge without Git-level merge conflict and break the build for the next person. The gate means humans do CodeReview+2, Zuul then kicks off another round of tests based on latest master, and only Jenkins does the actual Git merge.
- Since 2013 (Gerrit): Stacked patches instead of feature branches, these simultaneously foster both fast-paced development (by landing patches as they come, encouraging others to merge early and often, yet without being blocked on your previous patches having landed first) and quality (by testing and reviewing each patch on its own, rather than waiting for feature branches that become too big to review, yet too costly not to merge, and impossible to revert).
- Since 2013 (Zuul dependent pipeline): In late 2013 we enabled the dependent pipeline feature in our Zuul gate, whicih provides us with even higher guruantees of not introducing failures. In addition to covering the gap between patch upload and patch review (i.e. "Test at point of merging") this also covers race conditions between patches being reviewed at the same time. This happens every day, including Monday before branch cut, daily around UTC-midday when a majority of reviewers are active, and most importanty during backports straight to production. You really don't want to find out in production that prod+patch1 and prod+patch2 both pass but that merging both produces a failure you won't notice anywhere in CI - you'll instead encounter it cold in production after rolling it to all wikis. The dependent pipeline orders the patches rather than allowing races. To speed things up, it does test and merge things concurrently in practice, by using various algorithms and clever predictions which are then validated (or fallback to queueing). Ref T50419, T94322.
- Since 2013 (Zuul): Bi-directional vertical integration. When testing a MediaWiki core patch, CI also installs and tests a dozen extensions that make use of core APIs. In isolation this seems unneeded given that it should be permitted for core to make a breaking change and for extensions to adapt to that change. However, given that we deploy on a daily basis, it is unacceptable to leave the code in an undeployable state. Instead, CI for extensions runs with latest master of upstream MediaWiki core (not latest stable or latest WMF), and CI for MediaWiki core integrates downstream with various extensions. This means you naturally discover if your breaking change is affecting code used in production, and thus have to accomodate that beforehand instead of afterward. This lowers the barrier of expertise needed for patch authors, and increases confidence from patch reviewerers (CI tells you everything what you need to know!) It also takes away time pressure from having to fix things between A and B. Any patch can land at any time, allowing to maximise reviewer availability.
- Since 2015 (dependencies.yaml, now "Quibble"): Extra vertical integration, in addition to core and a given extension integrating in both directions, in 2015, the adoption of YAML-based dependency manifests in CI (T96690) resulted in large scale adoption of even more vertical integrations. It turns out teams find great value in installing additional extensions that are not hard dependencies, but provide coverage and confidence through other means. (Details below).
I'm mentioning all this because while our CI gives developers high confidence, it is not obvious per-se where that comes from. The above is meant to illustrate where this comes from, and, where it doesn't come from, so that my proposal will not come across as risking a compromise in this confidence. The above all remains unchanged.
The Good
Teams have adopted Extra vertical integration for various use cases, such as:
- Soft dependencies on an upstream extension that you provide extra functionality for (you are downstream).
This bi-directional integration makes breakage unlikely, by allowing patches upstream to ensure code is always deployable even for downstreams. Likewise, patches downstream make sure that they don't cause default functionality to break.- Cite runs with VisualEditor, not a hard dependency.
- AbuseFilter runs with CentralAuth.
- Downstream dependencies are extensions that you provide extra functionality to you, where you are the upstream.
- VisualEditor runs with Cite.
- Wikibase runs with WikibaseMediaInfo.
- Siblings between conceptually-related extensions that have been known to carry risks or are otherwise of interest to maintainers to keep a closer eye on, reduce chances of sideways breakage.
The Bad
Somewhere along the way, I believe we have forgotten that these "extra dependencies" are interpreted in our Zuul logic as infinitely transitive and recursive.
This is the main cause for why CI jobs for the REL1_43 branch of the Math extension are now timing out after 60 minutes. Even though we haven't gained significantly more Math tests, or added any dependencies, the time has crept up from 5min, 15min, 30min, 45min and as of this month beyond 60 minutes.
Math has zero hard dependencies:
"requires": { "extensions": {} },
In Zuul config, the following extra dependencies are declared by the maintainers of Math, because we provide added functionality for VisualEditor and Wikibase, both of which are in turn also dependency-free, so this is expected to add exactly two extra extensions to our CI jobs:
Math: - VisualEditor - Wikibase
In actuality, 65 extensions are installed. Here's a few of them:
- Math
- Math > VisualEditor
- Math > VisualEditor > Cite
- Math > VisualEditor > Cite > cldr
- Math > VisualEditor > Cite > CommunityConfiguration
- Math > VisualEditor > Cite > Gadgets
- Math > VisualEditor > Cite > ParserFunctions
- Math > VisualEditor > Cite > Popups
- Math > VisualEditor > Cite > WikiEditor
- Math > VisualEditor > Cite > …
- Math > VisualEditor > TemplateData
- Math > VisualEditor > FlaggedRevs
- Math > VisualEditor > ConfirmEdit
- Math > VisualEditor > DiscussionTools
- Math > VisualEditor > …
- Math > Wikibase
- Math > Wikibase > ArticlePlaceholder
- Math > Wikibase > ArticlePlaceholder > WikibaseCirrusSearch
- Math > Wikibase > ArticlePlaceholder > WikibaseCirrusSearch > CirrusSearch
- Math > Wikibase > ArticlePlaceholder > WikibaseCirrusSearch > CirrusSearch > WikibaseCirrusSearch > …
- Math > Wikibase > ArticlePlaceholder > WikibaseCirrusSearch > CirrusSearch > WikibaseLexeme
- Math > Wikibase > ArticlePlaceholder > WikibaseCirrusSearch > CirrusSearch > …
- Math > Wikibase > WikibaseMediaInfo
- Math > Wikibase > WikibaseMediaInfo > UniversalLanguageSelector >
- Math > Wikibase > WikibaseMediaInfo > UniversalLanguageSelector > …
This is not a dependency graph. While this started as a way to save a few keystrokes in the CI manifest for hard dependencies, the vast majority of these are now local decisions by local maintainers about extra extensions to install.
Sometimes these are soft dependencies, sometimes these are siblings, sometimes these are inverted dependencies (i.e. upstream integration). For example note how Math maintainers have an interest in VisualEditor and Wikibase, which Math consumes. VisualEditor maintainers in turn have an interest in their other consumers such as DiscussionTools. Likewise, Wikibase maintainers have an interest in consumers such as ArticlePlaceholder and WikibaseMediaInfo but neiher are relevant to Math. And it goes further with WBMI taking an interest in ULS, and so on.
For extreme cases, the PHPUnit "Standalone" group (docs: MW PHPUnit groups, MW PHPUnit tutorial) allows us to tag slow tests that may be skipped in integrated contexts from remote repositories, but that is a finegrained tool. At the top-level I think we need to
first disable this recursion and give repo owners agency over the "extra" extensions they do and don't want.
Proposal
- Preserve the list we have today. Nothing will be removed that is listed as a dependency today.
- Disable recursion.
- Do a one-time scan of extension.json to declare any missing dependencies, to make sure we keep any hard dependencies that came in through recursion until now.
- Provide a public diff, allowing maintainers to review the effective impact and to add any "extra" extensions they wish to install.
Impact: The canary for this is the Math extension , where I hope to cut the main non-parallel PHPUnit job by at least 70% from 60 minutes to 20 minutes.
Progress
https://docs.google.com/spreadsheets/d/1JA1AsCxcDN76WQOhhpxsgiQgKxJNzyjw2Rsl4xnXsyE/edit?gid=0#gid=0
Footnotes:
[1]: Industry "norms" are usually defined by big players, e.g. enterprises and other large collaborative projects. Google, Mozilla, Facebook, Apple, OpenStack. But, the Microsoft-led open source regression-to-the-mean of the 2010s have redefined the "norm" as whatever a minority of developers who speak at conferences and go viral on social media seem to "expect". And that expectation has been pushed down over two decades, down to a barely-minimum-viable experience provided by the supermarket homebrands of CI, such as Travis and GitHub. i.e. a single shell script and a text output buffer that stays for 14 days. Wow, it almost competes with 2001! (e.g. CruiseControl.) These may be good enough for startups and personal projects, and, 95% of medium-size projects can actually shoe-horn everything into this with many compormises, but it is by no means providing a high degree of quality assurance. And that's of course what CI is for. CI doesn't exist to deploy, it exists to maintain quality and stabilty, to detect failure.