Page MenuHomePhabricator

[REPO][CLIENT] Split PHPUnit test suite for wmf-quibble-vendor-mysql-php74-docker
Open, Needs TriagePublic

Description

According to the analysis in T361118: [REPO][CLIENT][SW] Reduce CI test runtime for Wikibase and related extensions, wmf-quibble-vendor-mysql-php74-docker takes the longest of the checks with 21.6 minutes, with the PHPUnit test suite taking 14.5 minutes. Any improvement to the runtime for this check would directly reduce the total round-trip time for CI checks.

Investigate splitting the test suite into smaller chunks. The chunks can then be run in parallel.

  • creating two chunks of 7.25 minutes would already improve the CI run time by 7.25 minutes
  • In the Python world, pytest-split is a popular plugin that implements test suite splitting for pytest suites.
  • phpunit has basic functionality for filtering test cases by test class name (e.g. phpunit --filter '/^[A-M]/', phpunit --filter '/^[N-Z]/')

Demonstrate the improvement in runtime by running the quibble docker image locally, with before / after timings, and propose patches to integration/config and integration/quibble .

Event Timeline

ArthurTaylor renamed this task from Split PHPUnit test suite for wmf-quibble-vendor-mysql-php74-docker to [REPO][CLIENT][SW] Split PHPUnit test suite for wmf-quibble-vendor-mysql-php74-docker.Tue, Apr 2, 2:44 PM
ArthurTaylor moved this task from Incoming to [DOT] By Project on the wmde-wikidata-tech board.

Prio Notes:

Impact AreaAffected
production / end users
monitoring
development efforts
onboarding efforts
additional stakeholders
ArthurTaylor renamed this task from [REPO][CLIENT][SW] Split PHPUnit test suite for wmf-quibble-vendor-mysql-php74-docker to [REPO][CLIENT] Split PHPUnit test suite for wmf-quibble-vendor-mysql-php74-docker.Thu, Apr 11, 8:16 AM
ArthurTaylor claimed this task.

Some progress notes / a status update.

Running the test job on its own (11200 tests):

$ time composer run --timeout=0 phpunit:entrypoint -- --testsuite extensions --group Database --exclude-group Broken,ParserFuzz,Stub,Standalone
...
real	7m38.242s
user	6m58.434s
sys	0m30.815s

Running tests with classes grouped into 8 groups, but the groups running sequentially (11290 tests):

real	4m15.663s
user	3m44.455s
sys	0m14.330s

With 8 suites running in parallel, and test classes assigned round-robin to suites:

real	2m26.348s
user	8m8.877s
sys	0m29.452s

With 8 suites running parallel, and test classes assigned to balance suite duration:

real	2m5.897s
user	9m47.516s
sys	0m37.622s

In the related subtasks, I've been fixing up some tests that run okay in a normal sequential suite but fail if they are run with an unfamiliar set of classes (or in an unexpected order). I think what confuses me most so far is the difference between the linear run with all tests, and the linear run with the tests split into 8 groups. One obvious theory is that the split groups are running different tests - it's confusing that the number of tests is more in the split run than in the normal run. Another more out-there theory is that in the process of running 11200 tests in the same process, there's enough stored state, leaked memory and overhead to slow the rest of the suite down.

Next steps will be to continue to fix tests that fail on reordering, and to investigate exactly which tests are running in the one configuration that are not in the other.