Page MenuHomePhabricator

ReadingLists tests not run in jenkins wmf-quibble-core-vendor-mysql-php81
Open, Stalled, MediumPublic

Description

It appears that the ReadingLIsts extension tests started failing after a change in mediawiki core.

ReadingLists is used in production by the mobile apps team, and soon also on web (as an experiment, initially).

This should have been caught by jenkins.

13:52:51 1) MediaWiki\Extension\ReadingLists\Tests\Integration\Rest\ListsUpdateHandlerTest::testListsUpdateFailure with data set "no body params" (null, array())
13:52:51 Failed asserting that exception of type "TypeError" matches expected exception "MediaWiki\Rest\LocalizedHttpException". Message was: "Wikimedia\Message\ListParam::__construct(): Argument #1 ($listType) must be of type Wikimedia\Message\ListType|string, Wikimedia\Message\ParamType given, called in /workspace/src/extensions/ReadingLists/src/Rest/RestUtilTrait.php on line 100" at
13:52:51 /workspace/src/includes/libs/Message/ListParam.php:29
13:52:51 /workspace/src/extensions/ReadingLists/src/Rest/RestUtilTrait.php:100
13:52:51 /workspace/src/extensions/ReadingLists/src/Rest/ListsUpdateHandler.php:85
13:52:51 /workspace/src/includes/Rest/SimpleHandler.php:41
13:52:51 /workspace/src/tests/phpunit/unit/includes/Rest/Handler/HandlerTestTrait.php:239
13:52:51 /workspace/src/extensions/ReadingLists/tests/phpunit/RestTestHelperTrait.php:101
13:52:51 /workspace/src/extensions/ReadingLists/tests/phpunit/integration/Rest/ListsUpdateHandlerTest.php:90

Details

Event Timeline

ReadingLists is not a "Gated Extension" (unlike, say AbuseFilter [1] etc), so no it shouldn't have been caught as is currently configured.

Adding extensions to the gate isn't free, nor necessarily cheap, so not a decision usually done lightly.

[1] https://github.com/wikimedia/integration-config/blob/bc77f58a386dd081ba4b19deeb551d584048896e/zuul/parameter_functions.py#L402

Change #1184176 had a related patch set uploaded (by Aude; author: Aude):

[integration/config@master] zuul: Add ReadingLists to extension-gate

https://gerrit.wikimedia.org/r/1184176

@Reedy ReadingLIsts is used in production by mobile apps and soon also on web (initially as an experiment). Changes in mediawiki core could break the ReadingLIsts extension in prod.

Lots of extensions are used in production. Well over 100. Only a fraction are in gate.

We put frequently-modified extensions into gate, but that might be the wrong metric: we're relying on CI for the ungated extensions to fail before we get to production, but if an extension has <1 commit/week the CI failure might not be noticed before the next train. That suggests that maybe gate should have *infrequently* updated extensions, and we can rely on CI breaking to notice the others.

Or else we have a seperate "gate-plus-plus" job that runs all 100 extension tests, but does it once/day, so there's at least an upper bound on when failures are noticed.

It’s also the number of tests, their speed, and in many cases… what dependencies they have too. As those cause problems and make things worse as a whole…

Yeah, I strongly suspect if we tried to run all 100+ at once they'd conflict with each other. But I've had conversations with @bd808 before about trying to make space for another level of "expensive to run tests, that nevertheless should be run at intervals". Eg we have an ~8hr test that we run prior to releasing a new tag of Parsoid every week. Running CI on more of the "ungated" extensions would be a strong candidate as well, and probably not *as* expensive as the Parsoid qualification tests.

Jdlrobson-WMF moved this task from Incoming to Backlog on the Reader Experience Team board.

Given ReadingLists is a feature used by all of logged in users apps (and within the next 12 months all of web) I would advocate for making this a gated extension as over the next 12 months there is an increased risk of UBN and blocking train deployments as this becomes a more prominent feature. There are no plans to add integration tests so this would be phpunit and Node.js tests.

FYI I see Graph is in the gated extensions, that can be removed given Graph has not been in production for some time? It seems like having a criteria explaining what belongs there might be helpful, there were a few things in the gated extensions list I was surprised to see there.

But I've had conversations with @bd808 before about trying to make space for another level of "expensive to run tests, that nevertheless should be run at intervals". Eg we have an ~8hr test that we run prior to releasing a new tag of Parsoid every week.

In the relatively near future (loosely expected by the end of December), the Pretrain project will provide a new location where critical user journey tests can be run against a MediaWiki codebase that is ahead of the codebase for the deployment train. Our initial goal is deploying the HEAD of MediaWiki core+skins+extensions to test.wikipedia.org around 01:00 UTC Monday through Thursday. If the initial rollout goes well we hope to progress to deploying more times per day and for more wikis in the following months.

Running CI on more of the "ungated" extensions would be a strong candidate as well, and probably not *as* expensive as the Parsoid qualification tests.

We could consider "overnight" test suites that we run based on a timer of some frequency for extensions and skins that are deployed for movement wikis but not in the gate. The first issue I can forsee that we would need to think a bit about is how the results of those non-blocking tests are surfaced and acted on.

Adding extensions to gated extensions is not the only way to catch problems.

There are two ways for Jenkins to catch problems:

  • You notice a change in one of your dependencies has broken your functionality – a nightly job could do this.
  • You prevent anyone from making a change that would break your functionality – this is what a gated extension does.

Would some kind of nightly job get you what you need?


The rest of this is rambling thoughts about the state of gated extensions.


Why folks are wary of new gated extensions

Adding an extension as a gated extension will slow down everyone’s development processes in a few ways—one obvious, one subtle, one insidious:

  • Obvious slowdown – the test will take time. Anytime anyone pushes a change for an extension, your tests will run, they will take time.
  • Subtle slowdown – the amount of time lost to breakage. Anytime anyone pushes a change for an extension, it may cause your tests to fail. And when those tests fail, they’ll require your expertise to unblock it. For production, this means 500 times a week, every gated extension is directly in the critical path to production, which means some human is standing in the path to production. This leads to frustration, which causes...
  • Insidious slowdown – the amount of time lost to folks frustrated by extensions in gated extensions rail about removing an extension due to the other kinds of slowdowns.

Folks have become wary of adding extensions because of the slowdowns above and other, subtle issues.

One not-so-obvious example is CentralAuth—where tests for central auth conflict with and shadow core tests. Lots of extensions used somewhere in production conflict with other parts of MediaWiki we care about (or other extensions in production).


Why the current set of gated extensions seems haphazard

The thinking about what should be gated has shifted as folks' understanding has evolved.

When gated extension became a thing, we added started adding a few actively developed extensions because adding a bit of compute time didn’t seem like a big deal—especially for small extensions. It took a while for pressure to mount on folks with gated extensions and CI maintainers to remove extensions due to slowing down folks' workflows.

Gated extensions seem like the right hammer sometimes, but most problems aren't nails, and this hammer is more like an adze or something that we're using as a hammer.

For ReadingLists, I think we mainly care about changes in MediaWiki core + the skins (Vector and MinervaNeue). Would there be a way to have an expanded set of gated extensions that run for changes in MediaWiki core + skins?

ReadingLists tightly integrates with the action API and RestAPI, among other functionality. So when there are changes in core that pertain to the APIs, then still think it's best for ReadingLists tests to run as a gated job. For unrelated extensions, I think it is unnecessary to run ReadingLists tests as a gated job. (only for core and skin changes)

For ReadingLists, I think we mainly care about changes in MediaWiki core + the skins (Vector and MinervaNeue). Would there be a way to have an expanded set of gated extensions that run for changes in MediaWiki core + skins?

In theory yes, but that means we would need to create a new job that clones those four repositories and is triggered for each of those repositories. With the new Zuul version, developers should be able to define that kind of job.

The only case we have so far are the wmf-quibble jobs hence this task to add ReadingLists to that. But since they are shared by a lot of extensions, they tend to be slow and thus cause any patch proposed to ReadingLists to be slower than they use to currently. Namely, they will run the whole Wikibase test suite :-/

An alternative would be to express the dependencies both way (I think that is T318107) which is not currently enforced. So you could have in zuul/dependencies.yaml:

ReadingLists:
 - MinervaNeue
 - Vector

Then manually define the backward dependencies:

MinervaNeue:
 - ReadingLists

Vector:
 - ReadingLists

And I think that will do the job faster than the wmf-quibble. The drawback is that it will causes MinervaNeue, Vector to run every single tests of Readinglists. Then they would also if ReadingLists via the the wmf-quibble jobs as proposed by this change.

Thanks everyone for sharing your thoughts and concerns.

I think the crux of the issue, is that people make large changes in core that often break CI in extensions and skins. More often then not when core breaks something, either teams get blocked (as their CI fails) or the core patch gets reverted. FWIW this happened to me recently while working on T395698: PerformanceBudgetTest should include module dependencies even while actively trying to avoid causing problems for others.

So the problem as I see it is we need a way for patches to be merged in core more confidently in a way that doesn't block code creation in other repositories and train blockers. So while the task here is one possible solution there may be others.

In the case of this bug, it got filed because https://gerrit.wikimedia.org/r/c/mediawiki/core/+/1172144 broke CI in ReadingLists and it wasn't originally clear to developers in ReadingList what broke and how to fix it until Umherirrender raised https://gerrit.wikimedia.org/r/c/mediawiki/extensions/ReadingLists/+/1183740. This lost about a few hours of that teams productivity.

I don't think nightly builds address the problem here which is developers rubbing up against each other and slowing down each other's work because we don't share the same context . If we have nightly builds I would say either 1) we'll still be reverting core changes, but maybe a day later than a few hours 2) we'll be shipping broken code leading to production errors and holding the train when we are not paying attention to it [not to say this isn't worth trying ... as all of that is measurable].

I see the following needs:

  1. A shared understanding (definition) of what a gated extension is and the criteria for including one (to understand the "what" and "why" we add them)
  2. We need to rethink "when" we run them to avoid clogging up the ecosystem.

A few questions/ideas (apologies if these have been considered and rejected):

  1. Could we limit the running of gated extensions on +2 when certain risky files change when (e.g. in core: composer.json, includes/api, tests/integration, include/libs) to avoid a lot of the slowdown concerns? Possibly defining this by being able to mark "risky" files in MediaWiki core via configuration file?
  2. Could we have a way to run all tests on demand similar to the recheck comment? E.g. If I comment testgated that runs all tests for all extensions for the current patch on demand so that people working on risky changes can get more confidence merging? [possibly it could refuse if CI is overstrained to encourage developers to use at less busy hours]
  3. What would need to happen to make all PHPUNIT production extension tests run so that tests are more complete (rather than the current confusing situation)?
  1. What would need to happen to make all PHPUNIT production extension tests run so that tests are more complete (rather than the current confusing situation)?

Do you mean running all tests for all extensions and skins every time there is a change to mediawiki/core.git? Or running all tests for mediawiki/*.git when there is a change to any repo in that collection? Or something else entirely?

The concept of "all tests" needs a bit of explanation here because we are not talking about any sort of monorepo that has a clear boundary.

ReadingLists tests broke again due to a change in core T404229

A few questions/ideas (apologies if these have been considered and rejected):

  1. Could we limit the running of gated extensions on +2 when certain risky files change when (e.g. in core: composer.json, includes/api, tests/integration, include/libs) to avoid a lot of the slowdown concerns? Possibly defining this by being able to mark "risky" files in MediaWiki core via configuration file?

Not in the way they are currently implemented. Gathering gated extensions happens in a different process than setting up tests.

  1. Could we have a way to run all tests on demand similar to the recheck comment? E.g. If I comment testgated that runs all tests for all extensions for the current patch on demand so that people working on risky changes can get more confidence merging? [possibly it could refuse if CI is overstrained to encourage developers to use at less busy hours]

I think you're asking, which is: Could commenting testgated on a patch checkout MediaWiki, all extensions in production, all skins in production, enable them and run all their tests?

There is a way to build that, but that build would be unlikely to pass.

There are 193 extensions and 8 skins, but no wiki has them all.

I think @Jdforrester-WMF tried this when he was a Release-Engineering-Team and there was a long-tail of problems which was the genesis of T249674: Have all Wikimedia production extensions and skins in the CI gate.

  1. What would need to happen to make all PHPUNIT production extension tests run so that tests are more complete (rather than the current confusing situation)?

This is a subset of the answer of the previous question other than the job would be slightly tweaked.


I see the following needs:

  1. A shared understanding (definition) of what a gated extension is and the criteria for including one (to understand the "what" and "why" we add them)
  2. We need to rethink "when" we run them to avoid clogging up the ecosystem.

This should be a different task (if it doesn't already exist). I'd be interested to involve Test-Platform / Quality-and-Test-Engineering-Team in these discussions.


I think what you're asking for is:

  • to use gated extensions
  • to rethink how gated extensions could work so you can use them
  • to build another gated extensions system that is non-voting

I am advising against using gated extensions (see my earlier comment T403560#11150311). As a further example for why, last week, removing Graphs from gated extensions as a followup from this thread blocked an emergency deployment.

Edit for clarity: This is not a hard block on adding anything to gated extensions, but I want to ensure we've explored options and understand the problem.

Rethinking how gated extensions work is something we should do, but is unlikely to unblock this in the near term.

I would advise against building a new system until we've re-evaluated how to build that new system, so this is blocked by the previous request. Also a new non-voting system is tantamount to an opt-in nightly job, which you've said does not address your concerns.

You've said that nightly builds do not address your concerns because you'd need to revert core code if it breaks your extension and/or seeing a broken nightly build would result in no action, which means knowingly shipping bad code to prod.

  • Is it always the case that you revert the core code if it breaks ReadingLists?
  • How do you feel about @hashar 's suggestions in T403560#11158409

Edit for clarity: This is not a hard block on adding anything to gated extensions, but I want to ensure we've explored options and understand the problem.

Rethinking how gated extensions work is something we should do, but is unlikely to unblock this in the near term.

I would advise against building a new system until we've re-evaluated how to build that new system, so this is blocked by the previous request. Also a new non-voting system is tantamount to an opt-in nightly job, which you've said does not address your concerns.

I'm planning on filing a new task (hopefully later today, time permitting) for this general problem, because it's not unique to ReadingLists. We understand that solving this problem in general will take time, and that's fine.

You've said that nightly builds do not address your concerns because you'd need to revert core code if it breaks your extension and/or seeing a broken nightly build would result in no action, which means knowingly shipping bad code to prod.

  • Is it always the case that you revert the core code if it breaks ReadingLists?

It depends on the specific situation. If a core patch made a breaking change that broke ReadingLists (or any other non-gated production extension), we'll try to adapt the extension code to the breaking change if it's feasible. But if doing that is hard or time-consuming, or a lot of extensions are affected, or something urgent is going on (e.g. this is blocking a UBN fix in that extension, which happened a few weeks ago), then we'll revert the core patch and ask its author to reintroduce it in a way that doesn't break the extension (or update it themselves).

That works for some problems, but it doesn't really work for the main problem, which is changes in MW core breaking CI for extensions. We could add a reverse dependency from ReadingLists to MW core, but that doesn't sound like a great answer to me, because ReadingLists isn't really that special among the 120+ extensions that are in production but are not in the gate. So if we do it for ReadingLists, there's not a great reason why we shouldn't do it for the other 120, but if they all do it I'm guessing the MW core tests would probably be too slow. Since this is a general problem not specific to ReadingLists, I think we should explore a general solution, which I'll file a new task for.

aude changed the task status from Open to Stalled.Oct 1 2025, 5:32 PM