Page MenuHomePhabricator

[SPIKE] Use PHPUnit test results cache timing data to distribute tests in parallel runs
Closed, ResolvedPublic

Description

Since T378478, tests are allocated to split_groups in alphabetical order by filename rather than round-robin. This creates less time-balanced groups - the tests for some extensions take longer than for others, and the slower extensions then slow down the split_group that they are included in.

The test splitting implementation has support for processing timing data for tests to create a more balanced distribution. Use this data when building the split_groups.

Acceptance Criteria

  • At least a Proof of concept for an approach to this should be complete
  • Data about the execution time of all tests is generated
  • Generated data is stored and made available for unauthenticated download by test runners
  • Parallel test runners fetch and process the execution time data to create more time-balanced split_groups.

Notes:

  • As indicated below, this appears to be somewhat of a spike task. Therefore, it will be wise to set a timebox while picking this task up
  • TIMEBOX: 24 Hrs.

Event Timeline

@ArthurTaylor While looking at these tickets during ticket staring time, a few questions arose:

  1. Where is this timing data to be stored?
  2. Were there any discussion with relevant WMF stakeholders about how to implement this that the rest of the team should be aware of?

There was no discussion so far with WMF stakeholders. I don't have a concrete plan for where the data should be stored - part of the work here would be to solve that problem. I could imagine having that investigation as a separate task, but the investigation should also include some kind of proof of concept, and once you have that you've basically done the task.

This creates less time-balanced groups - the tests for some extensions take longer than for others, and the slower extensions then slow down the split_group that they are included in.

Will the amount of time saved in CI by better balancing the groups justify the additional complexity that will be involved in this?

@kostajh looking at a recent Wikibase run, here is the distribution of time in the split groups:

Databaseless:
  - 0 ~ 7.347
  - 1 ~ 31.488
  - 2 ~ 9.194
  - 3 ~ 3.692
  - 4 ~ 7.881
  - 5 ~ 7.188
  - 6 ~ 30.108
==
Total: 96.898
Average: 13.843
Difference with balancing: 17.645 seconds faster

Database:
  - 0 ~ 59.272
  - 1 ~ 94.432
  - 2 ~ 25.646
  - 3 ~ 6.152
  - 4 ~ 165.649
  - 5 ~ 33.240
  - 6 ~ 372.699
  - 7 ~ 49.006
==
Total: 808.096
Average: 100.762
Difference with balancing: 271.937 seconds (4.5 mins) faster

So we would potentially save almost 5 mins of the 11 minute runtime if the groups were optimally balanced. Right now, the PHPUnit jobs are no longer on the critical path - it's Selenium that's slowing us down and that is our next focus area - but I think this is an effort worth making for the PHPUnit jobs at some point.

@kostajh looking at a recent Wikibase run, here is the distribution of time in the split groups:

Databaseless:
  - 0 ~ 7.347
  - 1 ~ 31.488
  - 2 ~ 9.194
  - 3 ~ 3.692
  - 4 ~ 7.881
  - 5 ~ 7.188
  - 6 ~ 30.108
==
Total: 96.898
Average: 13.843
Difference with balancing: 17.645 seconds faster

Database:
  - 0 ~ 59.272
  - 1 ~ 94.432
  - 2 ~ 25.646
  - 3 ~ 6.152
  - 4 ~ 165.649
  - 5 ~ 33.240
  - 6 ~ 372.699
  - 7 ~ 49.006
==
Total: 808.096
Average: 100.762
Difference with balancing: 271.937 seconds (4.5 mins) faster

So we would potentially save almost 5 mins of the 11 minute runtime if the groups were optimally balanced. Right now, the PHPUnit jobs are no longer on the critical path - it's Selenium that's slowing us down and that is our next focus area - but I think this is an effort worth making for the PHPUnit jobs at some point.

Thanks. Those are compelling numbers, so seems worth doing.

For accessing the timing data, could we do something like "find the most recent successful gate-and-submit job and download the PHPUnit cache file(s) from the artifacts of the job"? (We currently don't store the PHPUnit cache files in the artifacts, AFAICT, but would be straightforward to add.)

ItamarWMDE renamed this task from Use PHPUnit test results cache timing data to distribute tests in parallel runs to [SW] [SPIKE] Use PHPUnit test results cache timing data to distribute tests in parallel runs.Dec 6 2024, 10:36 AM
ItamarWMDE updated the task description. (Show Details)

Prio Notes:

Impact AreaAffected
production / end usersno
monitoringno
development effortsyes
onboarding effortsno
additional stakeholdersyes
ItamarWMDE triaged this task as Medium priority.Dec 6 2024, 10:39 AM
ItamarWMDE renamed this task from [SW] [SPIKE] Use PHPUnit test results cache timing data to distribute tests in parallel runs to [SPIKE] Use PHPUnit test results cache timing data to distribute tests in parallel runs.Dec 10 2024, 10:14 AM
ItamarWMDE updated the task description. (Show Details)

Change #1112718 had a related patch set uploaded (by Arthur taylor; author: Arthur taylor):

[mediawiki/core@master] Save results cache files to log folder for later analysis

https://gerrit.wikimedia.org/r/1112718

Change #1113147 had a related patch set uploaded (by Arthur taylor; author: Arthur taylor):

[mediawiki/core@master] Download latest phpunit results cache before parallel tests

https://gerrit.wikimedia.org/r/1113147

Change #1113983 had a related patch set uploaded (by Arthur taylor; author: Arthur taylor):

[integration/quibble@master] Update list of phpunit config files to copy to log directory

https://gerrit.wikimedia.org/r/1113983

Hit the timebox on this task after making some progress:

  • Data about the execution time of all tests is generated. 1112718 demonstrates an approach here that doesn't require changes to Quibble. The results cache files from the CI runs are added to the 'log' folder and automatically included in build artifacts. These are then available on https://integration.wikimedia.org for download. This patch is ready for review and can be merged without dependencies on any other part of this work.
  • Generated data is stored and made available for unauthenticated download by test runners. The raw results cache files themselves are available per the above. Additionally, the phpunit-results-cache tool (source / toolforge tool) allows results from multiple runs to be collated and downloaded as a combined results file.
  • Parallel test runners fetch and process the execution time data to create more time-balanced split_groups. 1113147 demonstrates an approach here. Because the timings for integration vs. unit tests are different for the same PHP classes, we actually need two different sets of timings. This implies two different phpunit.xml configurations. Unfortunately, if we don't create a phpunit.xml file, Quibble will error out, so we need 1113983 to change the way Quibble behaves.

Local testing of the code didn't result in much redistribution of tests within the buckets, because the naïve strategy of splitting groups by number of tests (for the case that data is missing) dominates the calculation - the per-bucket duration limit is never hit for some reason. Further investigation is left to follow-up tickets.

Created a simple toolforge service to fetch and merge results caches from CI runs: https://gitlab.wikimedia.org/arthurtaylor/phpunit-results-cache - https://phpunit-results-cache.toolforge.org

Can you add tools.wmde-wd-team to the tool maintainers so other developers can also access it? (Like e.g. on mismatch-finder.)

Also, it would be nice to have the tool’s source code in the standard toolforge-repos namespace on GitLab – you should be able to create a repo in toolsadmin.

I've created the linked repo - https://gitlab.wikimedia.org/toolforge-repos/phpunit-results-cache . But I can't find a way to add wmde-wd-team as a maintainer. I've made you an admin @Lucas_Werkmeister_WMDE - can you see how to add a team there?

Mentioned in SAL (#wikimedia-cloud) [2025-02-03T10:31:42Z] <Lucas_WMDE> add tools.wmde-wd-team (T378797)

It’s under “tools” (the wmde-wd-team “tool” is really just an ACL):

image.png (324×418 px, 17 KB)

Change #1112718 merged by jenkins-bot:

[mediawiki/core@master] Save results cache files to log folder for later analysis

https://gerrit.wikimedia.org/r/1112718

karapayneWMDE removed karapayneWMDE as the assignee of this task.
karapayneWMDE subscribed.

follow up is happening in T384927: Download combined phpunit.results.cache timing data and use it to create balanced split_groups

Change #1128831 had a related patch set uploaded (by Arthur taylor; author: Arthur taylor):

[mediawiki/core@master] Improve PHPUnit parallel split_group generation algorithm

https://gerrit.wikimedia.org/r/1128831

Change #1113147 merged by jenkins-bot:

[mediawiki/core@master] Download latest phpunit results cache before parallel tests

https://gerrit.wikimedia.org/r/1113147

Change #1113983 merged by jenkins-bot:

[integration/quibble@master] Update list of phpunit config files to copy to log directory

https://gerrit.wikimedia.org/r/1113983

Change #1140182 had a related patch set uploaded (by Hashar; author: Hashar):

[integration/quibble@master] release: Quibble 1.14.0

https://gerrit.wikimedia.org/r/1140182

Change #1140182 merged by jenkins-bot:

[integration/quibble@master] release: Quibble 1.14.0

https://gerrit.wikimedia.org/r/1140182

Change #1140215 had a related patch set uploaded (by Hashar; author: Hashar):

[integration/config@master] jjb: switch jobs to Quibble 1.14.0

https://gerrit.wikimedia.org/r/1140215

Change #1140215 merged by jenkins-bot:

[integration/config@master] jjb: switch jobs to Quibble 1.14.0

https://gerrit.wikimedia.org/r/1140215