Analysing the split-group timings (using jenkins-run-analysis), it seems that some of the imbalance in split-groups is caused by the large variance in runtimes for long-running test classes. For example ResourcesTest can run in 50 seconds, and it can take 180 seconds - the difference can make the split-group containing ResourcesTest take 130 seconds longer than expected.
Without knowing the root cause for the variance, it does at least seem reasonable to try and reduce the difference between expected and actual test durations using an average. A weighted average is simple to implement and does not require us to store more than one previous value for the test timing.
Implement a weighted average in phpunit-results-cache
Acceptance Criteria
- phpunit-results-cache uses a weighted average to combine new timings with existing timings
- the updated phpunit-results-cache server has been deployed.