Page MenuHomePhabricator

Speed up MediaWiki PHPUnit build by running integration tests in parallel
Open, HighPublic

Description

Background

Long ago, our Jenkins configuration used to include multiple jobs that run PHPUnit on MediaWiki core commits. Each job ran a portion of the tests (by passing a filter or subdirectory to the phpunit command), thus allowing them to run concurrently. Over the years, as things have been optimised and improved, these were eventually merged into one large job that basically "just" runs phpunit. This is how it has been for about 7 years now.

As per T225730: Reduce runtime of MW shared gate Jenkins jobs to 5 min, in the last 5 years the job has gotten significantly slower due to growth in how many tests we run, and so now the time developers spend waiting for a CI response to a MediaWiki commit, is spent in the PHPUnit job, and so we want to make it faster again. We believe the job is not exhausting its allocated resources and could run much faster, if it was parallelised somehow. By approaching it as a single job with concurrency (instead of splitting the jobs) we have two other benefits: 1) Keeps CI configuration simple, 2) Means it will also become fast by default for developers running tests locally.

Work

Look into software that would help running these in parallel within the job (separate threads/processes)

See also:
T60771: Jenkins: MediaWiki extensions phpunit test should also run mediawiki core tests

Update in 2022 by @aaron:

  • The most promising so far is paratest.
  • A notable restriction is that it does not support a wrapped phpunit command. MediaWiki currently wraps phpunit in tests/phpunit.php, being fixed as part of T90875 is a blocker unless we use paratest 1.x (no longer maintained), which did support custom wrappers.
  • Distribution of tests between subproceses should be invisible to developers in practice. There are multiple ways to do it, e.g.
    • by "suite" (per top-level subdir of tests/phpunit/includes),
    • by class (one Test.php file),
    • by function (individual test cases, including such that e.g. the "setUpBeforeClass" hooks will run multiple times, once in each process that end sup running one or more of the test functions, and even individual data provided cases could end up split and e.g. clash or do more work than is needed). The "by function" is known to have many failures with our current setup and would take much effort to make pass and keep passing.

2025-08 status update

This task is not ready to be worked on, because we first need to address the following:

  • Stop using "dynamic" PHPUnit suites (T345481, T358394). This is a prerequisite for both this task and the PHPUnit 10 upgrade (T328919), and it's a decent amount of work.
  • Make the parallel setup work in SQLite (also for MediaWiki-Docker). Could probably reuse something from Aaron's config
  • Update PHPUnit plugins so that they buffer their output from all processes and print it all at once at the end. Otherwise, paratest will just blackhole the output (see issue 771). This should be a relatively straightforward change but must be done upstream. We will need this at least for the "slow test detector" plugin, but note that we might switch to a different plugin because the one we're currently using is not compatible with PHPUnit 10.
  • The ob_start in bootstrap.common.php used to cause issues with paratest, although I don't remember what issues exactly, and I don't know whether that is still the case.

Once the above are done, switching to paratest should be relatively straightforward.

Details

Reference
bz48217
Related Changes in Gerrit:
Show related patches Customize query in gerrit

Related Objects

StatusSubtypeAssignedTask
OpenNone
OpenNone
Resolvedkostajh
Resolvedkostajh
ResolvedKrinkle
Resolvedkostajh
Resolvedkostajh
Resolvedkostajh
ResolvedDaimona
Resolvedkostajh
ResolvedDaimona
Resolveddaniel
ResolvedBUG REPORTkostajh
ResolvedDaimona
ResolvedDaimona
OpenNone
OpenNone
ResolvedArthurTaylor
ResolvedArthurTaylor
ResolvedNone
ResolvedNone
ResolvedArthurTaylor
ResolvedLucas_Werkmeister_WMDE
ResolvedArthurTaylor
ResolvedNone
DeclinedNone
ResolvedArthurTaylor
ResolvedArthurTaylor
DuplicateNone
Resolvedkostajh
OpenNone
Resolvedhashar
ResolvedNone
ResolvedNone
Resolvedjsn.sherman
ResolvedNone
ResolvedArthurTaylor
InvalidNone
DuplicateBUG REPORTNone
ResolvedUmherirrender
ResolvedNone
ResolvedArthurTaylor
ResolvedArthurTaylor
ResolvedArthurTaylor
ResolvedkarapayneWMDE
OpenNone
OpenNone
OpenNone

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

@pmiazga I've just +2'ed @ArthurTaylor's patch that uses the core scripts for generating a set of test suites, and the core script for running them in parallel. I think the next step is to work on rollout. @hashar is concerned about resource usage in CI. I'd suggest we start rolling this out progressively and seeing what the impact is on resource usage.

Change #1073438 had a related patch set uploaded (by Kosta Harlan; author: Kosta Harlan):

[integration/quibble@master] release: Quibble 1.10.0

https://gerrit.wikimedia.org/r/1073438

Change #1073476 had a related patch set uploaded (by Hashar; author: Hashar):

[integration/config@master] dockerfiles: update Quibble to 1.10.0

https://gerrit.wikimedia.org/r/1073476

Change #1073438 merged by jenkins-bot:

[integration/quibble@master] release: Quibble 1.10.0

https://gerrit.wikimedia.org/r/1073438

Change #1073476 merged by jenkins-bot:

[integration/config@master] dockerfiles: update Quibble to 1.10.0

https://gerrit.wikimedia.org/r/1073476

Change #1073542 had a related patch set uploaded (by Hashar; author: Hashar):

[integration/config@master] jjb: switch jobs to Quibble 1.10.0

https://gerrit.wikimedia.org/r/1073542

Change #1073542 merged by jenkins-bot:

[integration/config@master] jjb: switch jobs to Quibble 1.10.0

https://gerrit.wikimedia.org/r/1073542

@pmiazga I've just +2'ed @ArthurTaylor's patch that uses the core scripts for generating a set of test suites, and the core script for running them in parallel. I think the next step is to work on rollout. @hashar is concerned about resource usage in CI. I'd suggest we start rolling this out progressively and seeing what the impact is on resource usage.

Now that this patch is merged and deployed via Quibble 1.10.0 (thanks @hashar!) we are ready for the next steps. The tentative rollout plan that @ArthurTaylor, @hashar and I discussed is this:

  1. (optional) Make and deploy job for T372618: Create a daily job running core + extension PHPUnit tests serially
    • This reduces risk that parallel testing somehow leaves out tests from the generated test suite list, or otherwise results in tests passing via parallel tests, but not when run serially.
      • Who would subscribe to notifications for this job and how can it fit into process for monitoring general CI health?
  2. Enable parallel testing on all extensions but not MediaWiki core
    • Less risk of blocking the train in case of problems
  3. Enable everywhere (core + extensions) for a day and watch it (e.g. a Friday)
    • Monitor load levels
    • Leave enabled if there are no reported issues on that Friday or over the weekend
  4. Enable on backport branches for the train, but not older release branches

Note that rollout involves setting QUIBBLE_PHPUNIT_PARALLEL as an environment variable for relevant jobs, so undoing any rollout step is fairly straightforward.

@pmiazga @hashar @thcipriani does this sound OK to you?

Change #1074364 had a related patch set uploaded (by Kosta Harlan; author: Kosta Harlan):

[integration/config@master] zuul: Enable PHPUnit parallel tests on all extensions, exclude REL1* branches

https://gerrit.wikimedia.org/r/1074364

This sounds good to me. Thank you for handling this. @kostajh most likely this ticket should be assigned to you and @ArthurTaylor were the drivers of this work.

Change #1074364 merged by jenkins-bot:

[integration/config@master] zuul: Enable PHPUnit parallel tests on all extensions, exclude REL1* branches

https://gerrit.wikimedia.org/r/1074364

Change #1074480 had a related patch set uploaded (by Hashar; author: Kosta Harlan):

[integration/config@master] zuul: Enable PHPUnit parallel tests on all extensions, exclude REL1* branches

https://gerrit.wikimedia.org/r/1074480

Change #1074480 merged by jenkins-bot:

[integration/config@master] zuul: Enable PHPUnit parallel tests on all extensions, exclude REL1* branches

https://gerrit.wikimedia.org/r/1074480

Change #1075264 had a related patch set uploaded (by Jforrester; author: Jforrester):

[integration/config@master] Zuul: Disable parallel PHPUnit in Quibble for PHP 7.4 jobs

https://gerrit.wikimedia.org/r/1075264

Change #1075264 merged by jenkins-bot:

[integration/config@master] Zuul: Disable parallel PHPUnit in Quibble for PHP 7.4 jobs

https://gerrit.wikimedia.org/r/1075264

Mentioned in SAL (#wikimedia-releng) [2024-09-24T16:53:34Z] <James_F> Zuul: Disable parallel PHPUnit in Quibble for PHP 7.4 jobs, for T50217

Plan for re-enabling on php7.4 jobs:

  • Merge Avoid duplicate wgLogTypes entries
  • re-enable for all extensions except for WikiLambda and possibly CampaignEvents (T375756)
  • do some proactive "rechecks" on a bunch of repos (see list below) to see if all works OK

List generated by @hashar

zgrep -i 'mediawiki/extensions/.*to <Pipeline test' /var/log/zuul/zuul.log.2024-09-2*|cut -d\  -f6|sort|uniq -c|sort -nr|head -n20
    172 mediawiki/extensions/GrowthExperiments,
    140 mediawiki/extensions/Wikibase,
     74 mediawiki/extensions/Translate,
     73 mediawiki/extensions/ContentTranslation,
     69 mediawiki/extensions/CentralAuth,
     62 mediawiki/extensions/Cite,
     52 mediawiki/extensions/GlobalBlocking,
     48 mediawiki/extensions/AutoModerator,
     48 mediawiki/extensions/AbuseFilter,
     44 mediawiki/extensions/CheckUser,
     42 mediawiki/extensions/CommunityRequests,
     32 mediawiki/extensions/ThrottleOverride,
     30 mediawiki/extensions/WikiLambda,
     30 mediawiki/extensions/OAuth,
     28 mediawiki/extensions/WikibaseLexeme,
     28 mediawiki/extensions/Sanctions,
     26 mediawiki/extensions/WikimediaCampaignEvents,
     26 mediawiki/extensions/EventLogging,
     23 mediawiki/extensions/Scribunto,
     23 mediawiki/extensions/ExternalData,

Change #1076148 had a related patch set uploaded (by Kosta Harlan; author: Kosta Harlan):

[integration/config@master] zuul: Re-enable parallel PHPUnit for 7.4 jobs

https://gerrit.wikimedia.org/r/1076148

Change #1076148 merged by jenkins-bot:

[integration/config@master] zuul: Re-enable parallel PHPUnit for 7.4 jobs

https://gerrit.wikimedia.org/r/1076148

Unassigning myself as I'm not actively working on it and I don't want to give a false impression. Huge kudos to @ArthurTaylor and @kostajh for driving it.

This made the You should really speed up these slow tests... message a lot more annoying, as the errors are not at the end of the CI output anymore, and the lists of slow tests make it hard to quickly scan the output and find the errors.

Hi @Tgr,

Thanks for the feedback - I take the point that this is harder to read through now with the parallel runs. I've filed T378481 for changing the way we output details of the failed tests in parallel runs. If you have concrete suggestions for how to improve this, please add to / update / edit the ticket there.

@Tgr The change to the test output has now been merged. Hopefully that meets your needs, but please do let me know if it doesn't.

AIUI, the only thing left to do for this task is to attempt to run MediaWiki core's integration tests in parallel, the PHPUnit default suite (with database) task in mediawiki-quibble-vendor-mysql-php74.

AIUI, the only thing left to do for this task is to attempt to run MediaWiki core's integration tests in parallel, the PHPUnit default suite (with database) task in mediawiki-quibble-vendor-mysql-php74.

Filed as T380967: Run MediaWiki core's default suite in parallel

AIUI, the only thing left to do for this task is to attempt to run MediaWiki core's integration tests in parallel, the PHPUnit default suite (with database) task in mediawiki-quibble-vendor-mysql-php74.

I guess for the future we would also want to switch to native parallelization with paratest (once T345481, currently a subtask, is resolved). Would that be tracked here or elsewhere?

AIUI, the only thing left to do for this task is to attempt to run MediaWiki core's integration tests in parallel, the PHPUnit default suite (with database) task in mediawiki-quibble-vendor-mysql-php74.

I guess for the future we would also want to switch to native parallelization with paratest (once T345481, currently a subtask, is resolved). Would that be tracked here or elsewhere?

Given that we have an implementation that works in PHP, without a huge amount of supporting code, I am not sure that we gain much benefit in switching to paratest, which would likely bring a bunch of integration issues to solve again.