Page MenuHomePhabricator

Reduce runtime of MW shared gate Jenkins jobs to 5 min
Open, MediumPublic

Description

Objective

For the "typical" time it takes for a commit to be approved and landed in master to be 5 minutes or less.

Status quo

As of 11 June 2019, the gate usually takes around 20 minutes.

The two slowest jobs typically take 13-17 minutes each. The time for the gate overall is rarely under 15 minutes, because we run multiple of these jobs (increasing the chances of random slowness), and while they can run in parallel, they don't always start immediately - given limited CI execution slots.

Below a sample from a MediaWiki commit (master branch):

Gate pipeline build succeeded.
  • wmf-quibble-core-vendor-mysql-php72-docker SUCCESS in 12m 03s
  • wmf-quibble-core-vendor-mysql-hhvm-docker SUCCESS in 14m 12s
  • mediawiki-quibble-vendor-mysql-php72-docker SUCCESS in 7m 34s
  • mediawiki-quibble-vendor-mysql-php71-docker SUCCESS in 7m 12s
  • mediawiki-quibble-vendor-mysql-php70-docker SUCCESS in 6m 48s
  • mediawiki-quibble-vendor-mysql-hhvm-docker SUCCESS in 8m 32s
  • mediawiki-quibble-vendor-postgres-php72-docker SUCCESS in 10m 05s
  • mediawiki-quibble-vendor-sqlite-php72-docker SUCCESS in 7m 04s
  • mediawiki-quibble-composer-mysql-php70-docker SUCCESS in 8m 14s

(+ jobs that take less than 3 minutes: composer-test, npm-test, and phan.)

These can be grouped in two kinds of jobs:

  • wmf-quibble: These install MW with the gated extensions, and then run all PHPUnit, Selenium and QUnit tests.
  • mediawiki-quibble: These install MW bundled extensions only, and then run PHPUnit, Selenium and QUnit tests.

Stats from wmf-quibble-core-vendor-mysql-php72-docker

  • 9-15 minutes (wmf-gated, extensions-only)
  • Sample:
    • PHPUnit (dbless): 1.91 minutes / 15,782 tests.
    • QUnit: 29 seconds / 1286 tests.
    • Selenium: 143 seconds / 43 tests.
    • PHPUnit (db): 3.85 minutes / 4377 tests.

Stats from mediawiki-quibble-vendor-mysql-php72-docker:

  • 7-10 minutes (plain mediawiki-core)
  • Sample:
    • PHPUnit (unit+dbless): 1.5 minutes / 23,050 tests.
    • QUnit: 4 seconds / 437 tests.
    • PHPUnit (db): 4 minutes / 7604 tests.

Updated status quo

As of 11 May 2021, the gate usually takes around 25 minutes.

The slowest job typically takes 20-25 minutes per run. The time for the gate overall can never be faster than the slowest job, and can be worse as though we run other jobs in parallel, they don't always start immediately, due to given limited CI execution slots.

Below is the time results from a sample MediaWiki commit (master branch):

[Snipped: Jobs faster than 5 minutes]

  • 9m 43s: mediawiki-quibble-vendor-mysql-php74-docker/5873/console
  • 9m 47s: mediawiki-quibble-vendor-mysql-php73-docker/8799/console
  • 10m 03s: mediawiki-quibble-vendor-sqlite-php72-docker/10345/console
  • 10m 13s: mediawiki-quibble-composer-mysql-php72-docker/19129/console
  • 10m 28s: mediawiki-quibble-vendor-mysql-php72-docker/46482/console
  • 13m 11s: mediawiki-quibble-vendor-postgres-php72-docker/10259/console
  • 16m 44s: wmf-quibble-core-vendor-mysql-php72-docker/53990/console
  • 22m 26s: wmf-quibble-selenium-php72-docker/94038/console

Clearly the last two jobs are dominant in the timing:

  • wmf-quibble: This jobs installs MW with the gated extensions, and then runs all PHPUnit and QUnit tests.
  • wmf-quibble-selenium: This job installs MW with the gated extensions, and then runs all the Selenium tests.

Note that the mediawiki-quibble jobs each install just the MW bundled extensions, and then run PHPUnit, Selenium and QUnit tests.

Stats from wmf-quibble-core-vendor-mysql-php72-docker:

  • 13-18 minutes (wmf-gated, extensions-only)
  • Select times:
    • PHPUnit (unit tests): 9 seconds / 13,170 tests.
    • PHPUnit (DB-less integration tests): 3.31 minutes / 21,067 tests.
    • PHPUnit (DB-heavy): 7.91 minutes / 4,257 tests.
    • QUnit: 31 seconds / 1421 tests.

Stats from wmf-quibble-selenium-php72-docker:

  • 20-25 minutes

Scope of task

This task represents the goal of reaching 5 minutes or less. The work tracked here includes researching ways to get there, trying them out, and putting one or more ideas into practice. The task can be closed once we have reached the goal or if we have concluded it isn't feasible or useful.

Feel free to add/remove subtasks as we go along and consider different things.

Stuff done

Ideas to explore and related work

  • Look at the PHPUnit "Test Report" for a commit and sort the root by duration. Find the slowest ones and look at its test suite to look for ways to improve it. Is it repeating expensive setups? Perhaps that can be skipped or re-used. Is it running hundreds of variations for the same integration test? Perhaps reduce it to just 1 case for that story, and apply the remaining cases to a lighter unit test instead.

Details

ProjectBranchLines +/-Subject
integration/quibblemaster+24 -2
mediawiki/extensions/Echomaster+16 -33
mediawiki/extensions/Echomaster+6 -18
mediawiki/coremaster+22 -140
mediawiki/coremaster+1 K -679
mediawiki/coremaster+21 -14
mediawiki/coremaster+6 -4
mediawiki/coremaster+1 -1
integration/configmaster+0 -19
integration/configmaster+12 -5
mediawiki/coremaster+20 -19
mediawiki/coremaster+17 -49
mediawiki/coremaster+1 -3
mediawiki/coremaster+13 -1
integration/configmaster+22 -22
integration/configmaster+54 -0
integration/quibblemaster+4 -0
mediawiki/coremaster+3 -10
mediawiki/coremaster+1 -1
mediawiki/extensions/Wikibasemaster+70 -21
mediawiki/coremaster+27 -37
mediawiki/coremaster+29 -5
mediawiki/coremaster+37 -1
mediawiki/extensions/Babelmaster+47 -52
mediawiki/coremaster+16 -1
Show related patches Customize query in gerrit

Related Objects

StatusSubtypeAssignedTask
OpenNone
ResolvedLadsgroup
Resolvedaaron
Resolvedhashar
Resolvedaborrero
Resolvedhashar
Resolvedhashar
Resolvedhashar
ResolvedMholloway
DeclinedReedy
OpenNone
ResolvedKrinkle
ResolvedJdforrester-WMF
ResolvedJdforrester-WMF
DeclinedJdforrester-WMF
ResolvedKrinkle
Resolvedhashar
DeclinedNone
ResolvedJdforrester-WMF
Resolvedawight
OpenNone
Openawight
ResolvedNone
ResolvedDaimona
InvalidNone
OpenNone
OpenNone
ResolvedNone
DuplicateNone
OpenNone
OpenNone
OpenNone

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

The recent doubling and other regressions in test runtime appear to have settled.

Progress report on the [wmf-quibble-core-vendor-mysql-php72-docker](https://integration.wikimedia.org/ci/job/wmf-quibble-core-vendor-mysql-php72-docker/buildTimeTrend) job. (As run on mediawiki-core/master/gate-and-submit.)

TotalPHPUnit (pure unit)PHPUnit (dbless)QUnitSeleniumPHPUnit (db)
June 20199-15 min0 s (N/A)~ 2 min (15,782 tests)~ 30 s (1286 tests)~ 2.5 min (43 tests)~ 3.5 min (4377 tests)
October 20199-12 min9 s (5,788 tests)~ 2 min (17,703 tests)~ 30 s (1338 tests)0 s (N/A)~ 5.5 min (4397 tests)

Note:

  • The "pure unit" run for PHPUnit was newly introduced.
  • The "Selenium" sub task was moved into a parallel job.
  • The time spent waiting for a Jenkins executor is not included.
  • The time spent setting up the repos (Zuul cloner), loading dependencies (Castor, npm-install), etc, is included in the total but not in the subsequent columns.

Zuul does emit timing per pipeline/job/status. For the view over 3 months:

https://graphite.wikimedia.org/render/?width=930&height=464&_salt=1571078477.436&from=-3months&connectedLimit=&target=zuul.pipeline.gate-and-submit.job.wmf-quibble-core-vendor-mysql-php72-docker.SUCCESS.mean&lineMode=connected&yUnitSystem=msec&yMin=0

wmf-quibble-core (464×930 px, 47 KB)

The raise on 7/26 or 7/27 is the job that started running Selenium tests for each repositories.

I think the big spike around 08/22 - 08/27 was the introduction of the LocalisationCache as a service (T231200).

The last drop on 10/07 is the LocalisationCache service patch being introduced again 043d88f680cf66c90e2bdf423187ff8b994b1d02 and it comes with an in memory cache for l10n when using tests. That is a noticeable improvement.

The drop on 10/01 is that Selenium tests have been moved out of this job to a standalone one: wmf-quibble-selenium-php72-docker

One should be able to look at those per job metrics via https://grafana.wikimedia.org/d/000000283/zuul-job

A possible improvement is T235390. While the seccheck job is pretty fast, it consumes roughly 1.5GB of memory for the average extension (way more for core, where it still has to be enabled), and this usage will increase with future phan versions. Merging the two jobs would mean saving that amount of memory and hence freeing up CI resources. I don't know whether that would have a noticeable impact on the runtime, though.

Change 571762 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[mediawiki/core@master] resourceloader: Speed up structure/ResourcesTest

https://gerrit.wikimedia.org/r/571762

Change 571762 merged by jenkins-bot:
[mediawiki/core@master] resourceloader: Speed up structure/ResourcesTest

https://gerrit.wikimedia.org/r/571762

Change 578083 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[mediawiki/core@master] phpunit: Speed up MediaWikiCoversValidator trait

https://gerrit.wikimedia.org/r/578083

Change 578083 merged by jenkins-bot:
[mediawiki/core@master] phpunit: Speed up MediaWikiCoversValidator trait

https://gerrit.wikimedia.org/r/578083

Change 581005 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[integration/config@master] zuul: omit mediawiki-quibble-(composer|vendor) from mwcore test pipeline

https://gerrit.wikimedia.org/r/581005

Change 581005 merged by jenkins-bot:
[integration/config@master] zuul: omit mediawiki-quibble-(composer|vendor) from mwcore test pipeline

https://gerrit.wikimedia.org/r/581005

Change 581032 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[mediawiki/core@master] build: Merge doc linting into 'npm test'

https://gerrit.wikimedia.org/r/581032

Change 581059 had a related patch set uploaded (by Jforrester; owner: Jforrester):
[integration/config@master] layout: Drop mediawiki-core-javascript-docker, rely on mwgate-node10-docker

https://gerrit.wikimedia.org/r/581059

Change 581059 abandoned by Jforrester:
layout: Drop mediawiki-core-javascript-docker, rely on mwgate-node10-docker

Reason:
Duped by I47c58dd15.

https://gerrit.wikimedia.org/r/581059

Change 581032 merged by jenkins-bot:
[mediawiki/core@master] build: Merge doc linting into 'npm test'

https://gerrit.wikimedia.org/r/581032

I cant tell what kind of madness might occur if we changed the conditional from areAllJobsComplete to didAnyJobFail :-\

Picking this back up again (from T225730#5490156), is it worth a short experiment (24 hours?)? Potentially this would free up a lot of resources in addition to providing quicker feedback.

I cant tell what kind of madness might occur if we changed the conditional from areAllJobsComplete to didAnyJobFail :-\

Picking this back up again (from T225730#5490156), is it worth a short experiment (24 hours?)? Potentially this would free up a lot of resources in addition to providing quicker feedback.

I think it's worth trying. Let's spin this out into a sub-task.

Looking at the Test Result from the latest master commit for mediawiki-quibble-vendor-mysql-php72-docker, and the slowest namespaced test case is MediaWiki.Tests.Revision/McrRevisionStoreDbTest, where the slowest tests are testGetRevisionByTimestamp and testLoadRevisionFromTimestamp which both take one full second each.

Not surprising, as they quite literally contains sleep( 1 ). Let's fix that :)

Change 585595 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[mediawiki/core@master] Revision: use MWTimestamp directly instead of wfTimesamp()

https://gerrit.wikimedia.org/r/585595

Looking at the Test Result from the latest master commit for mediawiki-quibble-vendor-mysql-php72-docker, and the slowest namespaced test case is MediaWiki.Tests.Revision/McrRevisionStoreDbTest, where the slowest tests are testGetRevisionByTimestamp and testGetRevisionByTimestamp which both take one full second each.

Not surprising, as they quite literally contains sleep( 1 ). Let's fix that :)

Random idea: what about a PHPCS sniff that suggests MWTimestamp if it finds sleep in a test file?

Change 585599 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[mediawiki/core@master] Revision: Use setFakeTime() instead of sleep() in unit tests

https://gerrit.wikimedia.org/r/585599

Looking at the Test Result from the latest master commit for mediawiki-quibble-vendor-mysql-php72-docker, and the slowest namespaced test case is MediaWiki.Tests.Revision/McrRevisionStoreDbTest, where the slowest tests are testGetRevisionByTimestamp and testGetRevisionByTimestamp which both take one full second each.

Not surprising, as they quite literally contains sleep( 1 ). Let's fix that :)

Random idea: what about a PHPCS sniff that suggests MWTimestamp if it finds sleep in a test file?

Looking at the Test Result from the latest master commit for mediawiki-quibble-vendor-mysql-php72-docker, and the slowest namespaced test case is MediaWiki.Tests.Revision/McrRevisionStoreDbTest, where the slowest tests are testGetRevisionByTimestamp and testGetRevisionByTimestamp which both take one full second each.

Not surprising, as they quite literally contains sleep( 1 ). Let's fix that :)

Random idea: what about a PHPCS sniff that suggests MWTimestamp if it finds sleep in a test file?

A wfTimestamp -> MWTimestamp::convert converter, or at least a warn for uses of wfTimestamp generally would be good. Though it's not yet actually deprecated…

Change 585599 merged by jenkins-bot:
[mediawiki/core@master] Revision: Use setFakeTime() instead of sleep() in unit tests

https://gerrit.wikimedia.org/r/585599

Change 585595 merged by jenkins-bot:
[mediawiki/core@master] Revision: use MWTimestamp directly instead of wfTimesamp()

https://gerrit.wikimedia.org/r/585595

Change 592083 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[mediawiki/core@master] mediawiki.jqueryMsg: Refactor test suite to not make any API requests

https://gerrit.wikimedia.org/r/592083

Change 592083 merged by jenkins-bot:
[mediawiki/core@master] mediawiki.jqueryMsg: Refactor test suite to not make any API requests

https://gerrit.wikimedia.org/r/592083

From task description:

Stats from wmf-quibble-core-vendor-mysql-php72-docker:

  • 9-15 minutes (wmf-gated, extensions-only)
  • Sample:
    • PHPUnit (dbless): 1.91 minutes / 15,782 tests.
    • QUnit: 29 seconds / 1286 tests.
    • Selenium: 143 seconds / 43 tests.
    • PHPUnit (db): 3.85 minutes / 4377 tests.

Looking at a recent build for mediawiki/core (example) I notice a sharp regression in the "PHPUnit (db)" step.

It now takes 8.8 minutes / 4380 tests, with most of the slow ones relating to Wikibase/Scribunto:

You should really speed up these slow tests:
1. 5779ms to run Wikibase\Repo\Tests\Api\EditEntityTest:testEditEntity_producesCorrectSummary with data set "more than 50 languages changed and other parts changed"
2. 4737ms to run Wikibase\Repo\Tests\Api\EditEntityTest:testEditEntity_producesCorrectSummary with data set "more than 50 languages changed"
3. 4144ms to run Wikibase\Repo\Tests\Store\StoreTest:testRebuild with data set #0
4. 3789ms to run Wikibase\Repo\Tests\Api\SetClaimTest:testAddClaim
5. 3425ms to run Wikibase\Client\Tests\Integration\DataAccess\Scribunto\Scribunto_LuaWikibaseLibraryTest:testIncrementStatsKey
6. 3425ms to run Wikibase\Client\Tests\Integration\DataAccess\Scribunto\Scribunto_LuaWikibaseLibraryTest:testLua with data set #87
7. 3391ms to run Wikibase\Client\Tests\Integration\DataAccess\Scribunto\Scribunto_LuaWikibaseLibraryTest:testResolvePropertyId_labelNotFound
8. 3369ms to run Wikibase\Client\Tests\Integration\DataAccess\Scribunto\Scribunto_LuaWikibaseLibraryTest:testLua with data set #7
9. 3148ms to run Wikibase\Client\Tests\Integration\DataAccess\Scribunto\Scribunto_LuaWikibaseLibraryTest:testLua with data set #22
10. 3130ms to run Wikibase\Client\Tests\Integration\DataAccess\Scribunto\Scribunto_LuaWikibaseLibraryTest:testLua with data set #47
11. 3061ms to run Wikibase\Client\Tests\Integration\DataAccess\Scribunto\Scribunto_LuaWikibaseLibraryTest:testLua with data set #30
12. 2960ms to run Wikibase\Repo\Tests\Specials\SpecialNewPropertyTest:testEntityIsBeingCreated_WhenValidInputIsGiven with data set "another language"
13. 2949ms to run Wikibase\Client\Tests\Integration\DataAccess\Scribunto\Scribunto_LuaWikibaseLibraryTest:testLua with data set #15
14. 2944ms to run Wikibase\Client\Tests\Integration\DataAccess\Scribunto\Scribunto_LuaWikibaseLibraryTest:testLua with data set #55
15. 2827ms to run Wikibase\Repo\Tests\Specials\SpecialNewItemTest:testEntityIsBe

@Jdforrester-WMF Are these the ones we wanted to exclude from the shared gate job via standalone?

@Jdforrester-WMF Are these the ones we wanted to exclude from the shared gate job via standalone?

Yes, this is waiting on the next Quibble release to fix.

[mediawiki/extensions/WikibaseCirrusSearch@master] tests: Remove empty placeholder for selenium tests

Given we spend a full minute running through this no-op on every
MW patch and backport, remove it for now.

https://gerrit.wikimedia.org/r/c/mediawiki/extensions/WikibaseCirrusSearch/+/608983

Change 671381 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[mediawiki/core@master] ApiTestCase: Optimise the shared testApiTestGroup() test method

https://gerrit.wikimedia.org/r/671381

I forgot about this task when sending my patches to switch integration tests to unit tests (my understanding is that unit tests run faster) but there are a number of patches at https://gerrit.wikimedia.org/r/q/topic:%22unit-tests%22+status:open to switch integration tests to unit tests and thus speed them up

Change 671245 had a related patch set uploaded (by DannyS712; owner: DannyS712):
[mediawiki/core@master] Tests: use an empty hook container where possible

https://gerrit.wikimedia.org/r/671245

Change 673704 had a related patch set uploaded (by Daimona Eaytoy; owner: Daimona Eaytoy):
[mediawiki/core@master] phpunit: Remove MediaWikiPHPUnitTestListener

https://gerrit.wikimedia.org/r/673704

Change 673704 merged by jenkins-bot:
[mediawiki/core@master] phpunit: Remove MediaWikiPHPUnitTestListener

https://gerrit.wikimedia.org/r/673704

PHPUnit (unit tests): 9 seconds / 13,170 tests.
PHPUnit (DB-less integration tests): 3.31 minutes / 21,067 tests.
PHPUnit (DB-heavy): 7.91 minutes / 4,257 tests.

Maybe we can try to do a coordinated campaign to move PHPUnit (DB-less integration tests) -> PHPUnit unit tests, and PHPUnit Integration / DB-heavy tests -> PHPUnit DB-less integration tests.

I imagine there are a lot of tests using the heaviest option because the test was written a long time ago, or it was the easiest thing to do, and not because the actual DB integration aspect of it is needed for the test. @Ladsgroup wrote a script to identify tests that could work as pure unit tests but I think this would have to be a manual process.... x 25,000 tests :(

PHPUnit (unit tests): 9 seconds / 13,170 tests.
PHPUnit (DB-less integration tests): 3.31 minutes / 21,067 tests.
PHPUnit (DB-heavy): 7.91 minutes / 4,257 tests.

Maybe we can try to do a coordinated campaign to move PHPUnit (DB-less integration tests) -> PHPUnit unit tests, and PHPUnit Integration / DB-heavy tests -> PHPUnit DB-less integration tests.

I imagine there are a lot of tests using the heaviest option because the test was written a long time ago, or it was the easiest thing to do, and not because the actual DB integration aspect of it is needed for the test. @Ladsgroup wrote a script to identify tests that could work as pure unit tests but I think this would have to be a manual process.... x 25,000 tests :(

That sounds like a good project, yes.

Change 698467 had a related patch set uploaded (by Kosta Harlan; author: Kosta Harlan):

[mediawiki/extensions/Echo@master] selenium: Use a single login for browser tests

https://gerrit.wikimedia.org/r/698467

Change 698468 had a related patch set uploaded (by Kosta Harlan; author: Kosta Harlan):

[mediawiki/extensions/Echo@master] selenium: Move notifications page test into general suite

https://gerrit.wikimedia.org/r/698468

@kostajh Is the time spent mainly in API login or UI login, or are they both equally bad? Is there a way to speed that up perhaps? We toyed around last year with the idea that it might be due to overly strong password hashing which could be slow in VMs. Afiak we ruled that out but maybe it's still an issue?

@kostajh Is the time spent mainly in API login or UI login, or are they both equally bad? Is there a way to speed that up perhaps? We toyed around last year with the idea that it might be due to overly strong password hashing which could be slow in VMs. Afiak we ruled that out but maybe it's still an issue?

I haven't tried to measure that specifically; maybe the login flow is a red herring, and it's more specifically to do with many things being slower as a logged-in user due to additional code paths that are executed serially with PHP's built-in server.

Maybe there are deferred updates that require more processing time for authenticated users, and with PHP's built-in server these aren't really deferred (since we have a single thread to process requests) and are instead handled sequentially. That's something that would go away once we make some progress on T276428: Introduce non-voting jobs with quibble+apache

Change 698947 had a related patch set uploaded (by Kosta Harlan; author: Kosta Harlan):

[mediawiki/core@master] DeferredUpdates: Log execution time for updates

https://gerrit.wikimedia.org/r/698947

@kostajh Is the time spent mainly in API login or UI login, or are they both equally bad? Is there a way to speed that up perhaps? We toyed around last year with the idea that it might be due to overly strong password hashing which could be slow in VMs. Afiak we ruled that out but maybe it's still an issue?

I haven't tried to measure that specifically; maybe the login flow is a red herring, and it's more specifically to do with many things being slower as a logged-in user due to additional code paths that are executed serially with PHP's built-in server.

Maybe there are deferred updates that require more processing time for authenticated users, and with PHP's built-in server these aren't really deferred (since we have a single thread to process requests) and are instead handled sequentially. That's something that would go away once we make some progress on T276428: Introduce non-voting jobs with quibble+apache

Hmm, no, it doesn't seem like deferred updates are causing any substantial delays in the single-threaded execution context. https://gerrit.wikimedia.org/r/698947 added some logging for execution time for the deferred updates, and for e.g. the Selenium job (https://integration.wikimedia.org/ci/job/wmf-quibble-selenium-php72-docker/98521/consoleFull; logs: https://integration.wikimedia.org/ci/job/wmf-quibble-selenium-php72-docker/98521/artifact/log/mw-debug-www.log.gz), it appears that the processing time across the entire log file is 10.44 seconds.

Looking at https://integration.wikimedia.org/ci/job/wmf-quibble-apache-selenium-php72-docker/1990/console, one issue is that it's fully 8 minutes before the Selenium tests even get started, mostly due to lengthy npm install times.

Change 702909 had a related patch set uploaded (by Kosta Harlan; author: Kosta Harlan):

[integration/quibble@master] npm: Use cache flag for project directory and prefer offline

https://gerrit.wikimedia.org/r/702909