Page MenuHomePhabricator

Speed up MediaWiki PHPUnit build by running integration tests in parallel
Open, HighPublic

Assigned To
None
Authored By
Krinkle
May 7 2013, 4:52 PM
Referenced Files
F57619214: image.png
Oct 16 2024, 12:31 PM
Tokens
"Yellow Medal" token, awarded by aaron."Party Time" token, awarded by Novem_Linguae."Love" token, awarded by Dreamy_Jazz."Love" token, awarded by Michael."Cup of Joe" token, awarded by kostajh.

Description

Background

Long ago, our Jenkins configuration used to include multiple jobs that run PHPUnit on MediaWiki core commits. Each job ran a portion of the tests (by passing a filter or subdirectory to the phpunit command), thus allowing them to run concurrently. Over the years, as things have been optimised and improved, these were eventually merged into one large job that basically "just" runs phpunit. This is how it has been for about 7 years now.

As per T225730: Reduce runtime of MW shared gate Jenkins jobs to 5 min, in the last 5 years the job has gotten significantly slower due to growth in how many tests we run, and so now the time developers spend waiting for a CI response to a MediaWiki commit, is spent in the PHPUnit job, and so we want to make it faster again. We believe the job is not exhausting its allocated resources and could run much faster, if it was parallelised somehow. By approaching it as a single job with concurrency (instead of splitting the jobs) we have two other benefits: 1) Keeps CI configuration simple, 2) Means it will also become fast by default for developers running tests locally.

Work

Look into software that would help running these in parallel within the job (separate threads/processes)

See also:
T60771: Jenkins: MediaWiki extensions phpunit test should also run mediawiki core tests

Update in 2022 by @aaron:

  • The most promising so far is paratest.
  • A notable restriction is that it does not support a wrapped phpunit command. MediaWiki currently wraps phpunit in tests/phpunit.php, being fixed as part of T90875 is a blocker unless we use paratest 1.x (no longer maintained), which did support custom wrappers.
  • Distribution of tests between subproceses should be invisible to developers in practice. There are multiple ways to do it, e.g.
    • by "suite" (per top-level subdir of tests/phpunit/includes),
    • by class (one Test.php file),
    • by function (individual test cases, including such that e.g. the "setUpBeforeClass" hooks will run multiple times, once in each process that end sup running one or more of the test functions, and even individual data provided cases could end up split and e.g. clash or do more work than is needed). The "by function" is known to have many failures with our current setup and would take much effort to make pass and keep passing.

Details

Reference
bz48217
Show related patches Customize query in gerrit

Related Objects

StatusSubtypeAssignedTask
OpenNone
OpenNone
Resolvedkostajh
Resolvedkostajh
ResolvedKrinkle
Resolvedkostajh
Resolvedkostajh
Resolvedkostajh
ResolvedDaimona
Resolvedkostajh
ResolvedDaimona
Resolveddaniel
ResolvedBUG REPORTkostajh
ResolvedDaimona
ResolvedDaimona
OpenNone
OpenNone
ResolvedArthurTaylor
ResolvedArthurTaylor
ResolvedNone
ResolvedNone
ResolvedArthurTaylor
ResolvedLucas_Werkmeister_WMDE
ResolvedArthurTaylor
ResolvedNone
OpenArthurTaylor
ResolvedArthurTaylor
ResolvedArthurTaylor
DuplicateNone
OpenNone
OpenNone
Resolvedhashar
ResolvedNone
ResolvedNone
Resolvedjsn.sherman
ResolvedNone
OpenNone
InvalidNone
DuplicateBUG REPORTNone
ResolvedUmherirrender
ResolvedNone
ResolvedNone
OpenNone
OpenNone
Openhoo

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

I've been investigating this and I have to say I gave up on paratest pretty fast because I wasn't able to convince it to run the right set of tests. I've implemented more or less the approach from Sebastian Bergmann and proposed a patch to quibble to try this out: https://gerrit.wikimedia.org/r/c/integration/quibble/+/1031903 . So far I'm pretty happy with the results, but I would very much welcome input and feedback from other people who are actively working on this.

There are a few tests that fail in the parallel runs, but I've been slowly working through those in T361190. Besides that, it would be good at some point in the future to have a nightly job that runs the linear suite and generates the .phpunit.result.cachefile so that Quibble can use that to move evenly distribute the test classes over the testing buckets. Of course that has the disadvantage that the ordering of the tests might change every night - with round-robin allocation there is at least a fair amount of stability about which tests end up running with which others.

I've been investigating this and I have to say I gave up on paratest pretty fast because I wasn't able to convince it to run the right set of tests.

I still use the ancient paratest (5.0.4) version that lets you specify a phpunit executable path. I run the dev_paratest wrapper script from https://github.com/AaronSchulz/mediawiki-developer-tools very often. It doesn't support the speed trap thing though. I sometimes get failures due to bad assumptions of test order and side-effects (basically bugs in our own tests).

Modern paratest versions must be installed together with phpunit via composer and share code AFAIK (thus tighter version requirements are needed and met by composer). Using modern paratest/PHPUnit runs into issues with our sloppy use of @dataProvider and static state all over the place and I get lots of failure spam (I changed a few providers and then gave up since it would take forever to fix).

It doesn't support the speed trap thing though.

This is still a thing in latest paratest AFAIK. There's this issue from a while ago: https://github.com/paratestphp/paratest/issues/771. OTOH, speedtrap seems unmaintained and the issue about PHPUnit 10 support has been open for almost 1.5 years now. We'll have to make some decision on it as part of T328919: Upgrade to PHPUnit 10 anyway.

Using modern paratest/PHPUnit runs into issues with our sloppy use of @dataProvider and static state all over the place and I get lots of failure spam (I changed a few providers and then gave up since it would take forever to fix).

That's odd, I don't remember seeing very many failures when experimenting with paratest. r1005792 is a POC for paratest in CI; while there are some failures, things do not seem to be hopeless. Unfortunately though, it's quite hard to experiment with this more thoroughly: firstly because of T345481; and secondly because of SQLite going rogue when tests are run in parallel (T50217#9111027), for which I think we'll need the code you have in mediawiki-developer-tools to be brought into core.

I've implemented more or less the approach from Sebastian Bergmann and proposed a patch to quibble to try this out: https://gerrit.wikimedia.org/r/c/integration/quibble/+/1031903 .

This patch looks good to me and I would be happy to see it deployed so we start to see reduction in build times.

Paratest implementation sounds nice for a variety of reasons (PHP based, external library, works locally) but until an implementation exists, I'd propose that we move forward with the Quibble patch. If/when we get Paratest working, we can revert the Quibble-based approach.

Change #1043719 had a related patch set uploaded (by Kosta Harlan; author: Kosta Harlan):

[integration/quibble@master] release: Quibble 1.9.0

https://gerrit.wikimedia.org/r/1043719

@pmiazga I've just +2'ed @ArthurTaylor's patch that uses the core scripts for generating a set of test suites, and the core script for running them in parallel. I think the next step is to work on rollout. @hashar is concerned about resource usage in CI. I'd suggest we start rolling this out progressively and seeing what the impact is on resource usage.

Change #1073438 had a related patch set uploaded (by Kosta Harlan; author: Kosta Harlan):

[integration/quibble@master] release: Quibble 1.10.0

https://gerrit.wikimedia.org/r/1073438

Change #1073476 had a related patch set uploaded (by Hashar; author: Hashar):

[integration/config@master] dockerfiles: update Quibble to 1.10.0

https://gerrit.wikimedia.org/r/1073476

Change #1073438 merged by jenkins-bot:

[integration/quibble@master] release: Quibble 1.10.0

https://gerrit.wikimedia.org/r/1073438

Change #1073476 merged by jenkins-bot:

[integration/config@master] dockerfiles: update Quibble to 1.10.0

https://gerrit.wikimedia.org/r/1073476

Change #1073542 had a related patch set uploaded (by Hashar; author: Hashar):

[integration/config@master] jjb: switch jobs to Quibble 1.10.0

https://gerrit.wikimedia.org/r/1073542

Change #1073542 merged by jenkins-bot:

[integration/config@master] jjb: switch jobs to Quibble 1.10.0

https://gerrit.wikimedia.org/r/1073542

@pmiazga I've just +2'ed @ArthurTaylor's patch that uses the core scripts for generating a set of test suites, and the core script for running them in parallel. I think the next step is to work on rollout. @hashar is concerned about resource usage in CI. I'd suggest we start rolling this out progressively and seeing what the impact is on resource usage.

Now that this patch is merged and deployed via Quibble 1.10.0 (thanks @hashar!) we are ready for the next steps. The tentative rollout plan that @ArthurTaylor, @hashar and I discussed is this:

  1. (optional) Make and deploy job for T372618: Create a daily job running core + extension PHPUnit tests serially
    • This reduces risk that parallel testing somehow leaves out tests from the generated test suite list, or otherwise results in tests passing via parallel tests, but not when run serially.
      • Who would subscribe to notifications for this job and how can it fit into process for monitoring general CI health?
  2. Enable parallel testing on all extensions but not MediaWiki core
    • Less risk of blocking the train in case of problems
  3. Enable everywhere (core + extensions) for a day and watch it (e.g. a Friday)
    • Monitor load levels
    • Leave enabled if there are no reported issues on that Friday or over the weekend
  4. Enable on backport branches for the train, but not older release branches

Note that rollout involves setting QUIBBLE_PHPUNIT_PARALLEL as an environment variable for relevant jobs, so undoing any rollout step is fairly straightforward.

@pmiazga @hashar @thcipriani does this sound OK to you?

Change #1074364 had a related patch set uploaded (by Kosta Harlan; author: Kosta Harlan):

[integration/config@master] zuul: Enable PHPUnit parallel tests on all extensions, exclude REL1* branches

https://gerrit.wikimedia.org/r/1074364

This sounds good to me. Thank you for handling this. @kostajh most likely this ticket should be assigned to you and @ArthurTaylor were the drivers of this work.

Change #1074364 merged by jenkins-bot:

[integration/config@master] zuul: Enable PHPUnit parallel tests on all extensions, exclude REL1* branches

https://gerrit.wikimedia.org/r/1074364

Change #1074480 had a related patch set uploaded (by Hashar; author: Kosta Harlan):

[integration/config@master] zuul: Enable PHPUnit parallel tests on all extensions, exclude REL1* branches

https://gerrit.wikimedia.org/r/1074480

Change #1074480 merged by jenkins-bot:

[integration/config@master] zuul: Enable PHPUnit parallel tests on all extensions, exclude REL1* branches

https://gerrit.wikimedia.org/r/1074480

Change #1075264 had a related patch set uploaded (by Jforrester; author: Jforrester):

[integration/config@master] Zuul: Disable parallel PHPUnit in Quibble for PHP 7.4 jobs

https://gerrit.wikimedia.org/r/1075264

Change #1075264 merged by jenkins-bot:

[integration/config@master] Zuul: Disable parallel PHPUnit in Quibble for PHP 7.4 jobs

https://gerrit.wikimedia.org/r/1075264

Mentioned in SAL (#wikimedia-releng) [2024-09-24T16:53:34Z] <James_F> Zuul: Disable parallel PHPUnit in Quibble for PHP 7.4 jobs, for T50217

Plan for re-enabling on php7.4 jobs:

  • Merge Avoid duplicate wgLogTypes entries
  • re-enable for all extensions except for WikiLambda and possibly CampaignEvents (T375756)
  • do some proactive "rechecks" on a bunch of repos (see list below) to see if all works OK

List generated by @hashar

zgrep -i 'mediawiki/extensions/.*to <Pipeline test' /var/log/zuul/zuul.log.2024-09-2*|cut -d\  -f6|sort|uniq -c|sort -nr|head -n20
    172 mediawiki/extensions/GrowthExperiments,
    140 mediawiki/extensions/Wikibase,
     74 mediawiki/extensions/Translate,
     73 mediawiki/extensions/ContentTranslation,
     69 mediawiki/extensions/CentralAuth,
     62 mediawiki/extensions/Cite,
     52 mediawiki/extensions/GlobalBlocking,
     48 mediawiki/extensions/AutoModerator,
     48 mediawiki/extensions/AbuseFilter,
     44 mediawiki/extensions/CheckUser,
     42 mediawiki/extensions/CommunityRequests,
     32 mediawiki/extensions/ThrottleOverride,
     30 mediawiki/extensions/WikiLambda,
     30 mediawiki/extensions/OAuth,
     28 mediawiki/extensions/WikibaseLexeme,
     28 mediawiki/extensions/Sanctions,
     26 mediawiki/extensions/WikimediaCampaignEvents,
     26 mediawiki/extensions/EventLogging,
     23 mediawiki/extensions/Scribunto,
     23 mediawiki/extensions/ExternalData,

Change #1076148 had a related patch set uploaded (by Kosta Harlan; author: Kosta Harlan):

[integration/config@master] zuul: Re-enable parallel PHPUnit for 7.4 jobs

https://gerrit.wikimedia.org/r/1076148

Change #1076148 merged by jenkins-bot:

[integration/config@master] zuul: Re-enable parallel PHPUnit for 7.4 jobs

https://gerrit.wikimedia.org/r/1076148

Unassigning myself as I'm not actively working on it and I don't want to give a false impression. Huge kudos to @ArthurTaylor and @kostajh for driving it.

This made the You should really speed up these slow tests... message a lot more annoying, as the errors are not at the end of the CI output anymore, and the lists of slow tests make it hard to quickly scan the output and find the errors.

Hi @Tgr,

Thanks for the feedback - I take the point that this is harder to read through now with the parallel runs. I've filed T378481 for changing the way we output details of the failed tests in parallel runs. If you have concrete suggestions for how to improve this, please add to / update / edit the ticket there.

@Tgr The change to the test output has now been merged. Hopefully that meets your needs, but please do let me know if it doesn't.