Page MenuHomePhabricator

Reduce runtime of MW shared gate Jenkins jobs to 5 min
Open, MediumPublic

Description

Objective

For the "typical" time it takes for a commit to be approved and landed in master to be 5 minutes or less.

Status quo

As of 11 June 2019, the gate usually takes around 20 minutes.

The two slowest jobs typically take 13-17 minutes each. The time for the gate overall is rarely under 15 minutes, because we run multiple of these jobs (increasing the chances of random slowness), and while they can run in parallel, they don't always start immediately - given limited CI execution slots.

Below a sample from a MediaWiki commit (master branch):

Gate pipeline build succeeded.
  • wmf-quibble-core-vendor-mysql-php72-docker SUCCESS in 12m 03s
  • wmf-quibble-core-vendor-mysql-hhvm-docker SUCCESS in 14m 12s
  • mediawiki-quibble-vendor-mysql-php72-docker SUCCESS in 7m 34s
  • mediawiki-quibble-vendor-mysql-php71-docker SUCCESS in 7m 12s
  • mediawiki-quibble-vendor-mysql-php70-docker SUCCESS in 6m 48s
  • mediawiki-quibble-vendor-mysql-hhvm-docker SUCCESS in 8m 32s
  • mediawiki-quibble-vendor-postgres-php72-docker SUCCESS in 10m 05s
  • mediawiki-quibble-vendor-sqlite-php72-docker SUCCESS in 7m 04s
  • mediawiki-quibble-composer-mysql-php70-docker SUCCESS in 8m 14s

(+ jobs that take less than 3 minutes: composer-test, npm-test, and phan.)

These can be grouped in two kinds of jobs:

  • wmf-quibble: These install MW with the gated extensions, and then run all PHPUnit, Selenium and QUnit tests.
  • mediawiki-quibble: These install MW bundled extensions only, and then run PHPUnit, Selenium and QUnit tests.

Stats from wmf-quibble-core-vendor-mysql-php72-docker

  • 9-15 minutes (wmf-gated, extensions-only)
  • Sample:
    • PHPUnit (dbless): 1.91 minutes / 15,782 tests.
    • QUnit: 29 seconds / 1286 tests.
    • Selenium: 143 seconds / 43 tests.
    • PHPUnit (db): 3.85 minutes / 4377 tests.

Stats from mediawiki-quibble-vendor-mysql-php72-docker:

  • 7-10 minutes (plain mediawiki-core)
  • Sample:
    • PHPUnit (unit+dbless): 1.5 minutes / 23,050 tests.
    • QUnit: 4 seconds / 437 tests.
    • PHPUnit (db): 4 minutes / 7604 tests.

Scope of task

This task represents the goal of reaching 5 minutes or less. The work tracked here includes researching ways to get there, trying them out, and putting one or more ideas into practice. The task can be closed once we have reached the goal or if we have concluded it isn't feasible or useful.

Feel free to add/remove subtasks as we go along and consider different things.

Stuff done

Ideas to explore and related work

  • Look at the PHPUnit "Test Report" for a commit and sort the root by duration. Find the slowest ones and look at its test suite to look for ways to improve it. Is it repeating expensive setups? Perhaps that can be skipped or re-used. Is it running hundreds of variations for the same integration test? Perhaps reduce it to just 1 case for that story, and apply the remaining cases to a lighter unit test instead.

Details

ProjectBranchLines +/-Subject
mediawiki/coremaster+1 K -679
mediawiki/coremaster+21 -14
mediawiki/coremaster+6 -4
mediawiki/coremaster+1 -1
integration/configmaster+0 -19
integration/configmaster+12 -5
mediawiki/coremaster+20 -19
mediawiki/coremaster+17 -49
mediawiki/coremaster+1 -3
mediawiki/coremaster+13 -1
integration/configmaster+22 -22
integration/configmaster+54 -0
integration/quibblemaster+4 -0
mediawiki/coremaster+3 -10
mediawiki/coremaster+1 -1
mediawiki/extensions/Wikibasemaster+70 -21
mediawiki/coremaster+27 -37
mediawiki/coremaster+29 -5
mediawiki/coremaster+37 -1
mediawiki/extensions/Babelmaster+47 -52
mediawiki/coremaster+16 -1
Show related patches Customize query in gerrit

Related Objects

StatusSubtypeAssignedTask
OpenNone
ResolvedLadsgroup
Resolvedaaron
Resolvedhashar
Resolvedaborrero
Resolvedhashar
Resolvedhashar
Resolvedhashar
ResolvedMholloway
DeclinedReedy
OpenNone
ResolvedKrinkle
ResolvedJdforrester-WMF
ResolvedJdforrester-WMF
DeclinedJdforrester-WMF
ResolvedKrinkle
Resolvedhashar
DeclinedNone
ResolvedJdforrester-WMF
Openawight
Openawight
OpenNone
OpenDaimona
OpenNone
OpenNone
OpenNone
ResolvedNone
DuplicateNone

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

For Selenium/Qunit, we should be able to extract them to a standalone job. T232759

Is it possible to have the default behavior be that if one job fails, all other jobs for the build are terminated? That way you get much quicker feedback and reduce resource usage. (I think this idea has been discussed somewhere but not sure if there's a phab task for it.)

There could maybe also be an option like "recheck dont-stop-on-failure" which would use the current behavior, of letting all jobs run even if one or more have failed.

Is it possible to have the default behavior be that if one job fails, all other jobs for the build are terminated? That way you get much quicker feedback and reduce resource usage. (I think this idea has been discussed somewhere but not sure if there's a phab task for it.)

That logic is in the Zuul scheduler, it continuously process items in the queues to check their status and whether jobs have completed for a change but only reports once all jobs have completed. The code is:

zuul/scheduler.py
1167 class BasePipelineManager(object):
...
1557     def _processOneItem(self, item, nnfi):
...
1606         if self.pipeline.didAnyJobFail(item):  # <<< flag the item has failing as soon as a job failled
1607             failing_reasons.append("at least one job failed")
...
1612         if ((not item_ahead) and self.pipeline.areAllJobsComplete(item)   # <<< but a conditional on all jobs having completed
1613             and item.live):
1614             try:
1615                 self.reportItem(item)    # and once all jobs have completed, report back to Gerrit.
1616             except exceptions.MergeFailure:
1617                 failing_reasons.append("it did not merge")
1618                 for item_behind in item.items_behind:
1619                     self.log.info("Resetting builds for change %s because the "
1620                                   "item ahead, %s, failed to merge" %
1621                                   (item_behind.change, item))
1622                     self.cancelJobs(item_behind)
1623             self.dequeueItem(item)  # Finally remove the item from the queue after it has been reported
1624             changed = True

I cant tell what kind of madness might occur if we changed the conditional from areAllJobsComplete to didAnyJobFail :-\

There could maybe also be an option like "recheck dont-stop-on-failure" which would use the current behavior, of letting all jobs run even if one or more have failed.

I am pretty sure that can not be implemented. At least not with some major code to be added on top our forked/outdated Zuul fork :-\

Change 537469 had a related patch set uploaded (by Hashar; owner: Hashar):
[integration/config@master] docker: rebuild for quibble 0.0.35

https://gerrit.wikimedia.org/r/537469

Change 537469 merged by jenkins-bot:
[integration/config@master] docker: rebuild for quibble 0.0.35

https://gerrit.wikimedia.org/r/537469

Change 537485 had a related patch set uploaded (by Jforrester; owner: Jforrester):
[integration/config@master] jjb: Upgrade all quibble jobs to 0.0.35

https://gerrit.wikimedia.org/r/537485

Change 537485 merged by jenkins-bot:
[integration/config@master] jjb: Upgrade all quibble jobs to 0.0.35

https://gerrit.wikimedia.org/r/537485

Change 528933 had a related patch set uploaded (by Ladsgroup; owner: Amir Sarabadani):
[integration/quibble@master] Set cache directory

https://gerrit.wikimedia.org/r/528933

This has been released and deployed on September 17th with Quibble 0.0.35, so that we need have Quibble setting $wgCacheDirectory. I haven't verified the outcome of the change though. Eventually when testing it I found out the Sqlite object cache is not properly initialized and not used that is T233117

This has been released and deployed on September 17th with Quibble 0.0.35, so that we need have Quibble setting $wgCacheDirectory. I haven't verified the outcome of the change though. Eventually when testing it I found out the Sqlite object cache is not properly initialized and not used that is T233117

This still needs this patch to be merged to be more effective I think.

Change 537743 merged by jenkins-bot:
[integration/config@master] jjb: [quibble] Fix syntax for --skip command

https://gerrit.wikimedia.org/r/537743

This will reduce the quibble job length by skipping "npm test" step (which is already covered by the node10-test jobs). See also T233143 and T225248#5485061.

Krinkle updated the task description. (Show Details)Sep 18 2019, 10:07 PM

Change 528903 abandoned by Ladsgroup:
localisation: Add process cache to LCStoreDB

Reason:
Not needed anymore

https://gerrit.wikimedia.org/r/528903

Krinkle updated the task description. (Show Details)Sep 26 2019, 3:35 AM

Change 539691 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[mediawiki/core@master] selenium: Remove "RunJobs" wait from specialrecentchanges test

https://gerrit.wikimedia.org/r/539691

Change 539691 merged by jenkins-bot:
[mediawiki/core@master] selenium: Remove "RunJobs" wait from specialrecentchanges test

https://gerrit.wikimedia.org/r/539691

Krinkle added a comment.EditedOct 12 2019, 10:16 PM

The recent doubling and other regressions in test runtime appear to have settled.

Progress report on the [wmf-quibble-core-vendor-mysql-php72-docker](https://integration.wikimedia.org/ci/job/wmf-quibble-core-vendor-mysql-php72-docker/buildTimeTrend) job. (As run on mediawiki-core/master/gate-and-submit.)

TotalPHPUnit (pure unit)PHPUnit (dbless)QUnitSeleniumPHPUnit (db)
June 20199-15 min0 s (N/A)~ 2 min (15,782 tests)~ 30 s (1286 tests)~ 2.5 min (43 tests)~ 3.5 min (4377 tests)
October 20199-12 min9 s (5,788 tests)~ 2 min (17,703 tests)~ 30 s (1338 tests)0 s (N/A)~ 5.5 min (4397 tests)

Note:

  • The "pure unit" run for PHPUnit was newly introduced.
  • The "Selenium" sub task was moved into a parallel job.
  • The time spent waiting for a Jenkins executor is not included.
  • The time spent setting up the repos (Zuul cloner), loading dependencies (Castor, npm-install), etc, is included in the total but not in the subsequent columns.

Zuul does emit timing per pipeline/job/status. For the view over 3 months:

https://graphite.wikimedia.org/render/?width=930&height=464&_salt=1571078477.436&from=-3months&connectedLimit=&target=zuul.pipeline.gate-and-submit.job.wmf-quibble-core-vendor-mysql-php72-docker.SUCCESS.mean&lineMode=connected&yUnitSystem=msec&yMin=0

The raise on 7/26 or 7/27 is the job that started running Selenium tests for each repositories.

I think the big spike around 08/22 - 08/27 was the introduction of the LocalisationCache as a service (T231200).

The last drop on 10/07 is the LocalisationCache service patch being introduced again 043d88f680cf66c90e2bdf423187ff8b994b1d02 and it comes with an in memory cache for l10n when using tests. That is a noticeable improvement.

The drop on 10/01 is that Selenium tests have been moved out of this job to a standalone one: wmf-quibble-selenium-php72-docker

One should be able to look at those per job metrics via https://grafana.wikimedia.org/d/000000283/zuul-job

A possible improvement is T235390. While the seccheck job is pretty fast, it consumes roughly 1.5GB of memory for the average extension (way more for core, where it still has to be enabled), and this usage will increase with future phan versions. Merging the two jobs would mean saving that amount of memory and hence freeing up CI resources. I don't know whether that would have a noticeable impact on the runtime, though.

Change 571762 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[mediawiki/core@master] resourceloader: Speed up structure/ResourcesTest

https://gerrit.wikimedia.org/r/571762

Change 571762 merged by jenkins-bot:
[mediawiki/core@master] resourceloader: Speed up structure/ResourcesTest

https://gerrit.wikimedia.org/r/571762

Change 578083 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[mediawiki/core@master] phpunit: Speed up MediaWikiCoversValidator trait

https://gerrit.wikimedia.org/r/578083

Change 578083 merged by jenkins-bot:
[mediawiki/core@master] phpunit: Speed up MediaWikiCoversValidator trait

https://gerrit.wikimedia.org/r/578083

Krinkle updated the task description. (Show Details)Mar 11 2020, 4:31 AM

Change 581005 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[integration/config@master] zuul: omit mediawiki-quibble-(composer|vendor) from mwcore test pipeline

https://gerrit.wikimedia.org/r/581005

Change 581005 merged by jenkins-bot:
[integration/config@master] zuul: omit mediawiki-quibble-(composer|vendor) from mwcore test pipeline

https://gerrit.wikimedia.org/r/581005

Change 581032 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[mediawiki/core@master] build: Merge doc linting into 'npm test'

https://gerrit.wikimedia.org/r/581032

Change 581059 had a related patch set uploaded (by Jforrester; owner: Jforrester):
[integration/config@master] layout: Drop mediawiki-core-javascript-docker, rely on mwgate-node10-docker

https://gerrit.wikimedia.org/r/581059

Change 581059 abandoned by Jforrester:
layout: Drop mediawiki-core-javascript-docker, rely on mwgate-node10-docker

Reason:
Duped by I47c58dd15.

https://gerrit.wikimedia.org/r/581059

Change 581032 merged by jenkins-bot:
[mediawiki/core@master] build: Merge doc linting into 'npm test'

https://gerrit.wikimedia.org/r/581032

DannyS712 added a subscriber: DannyS712.
Krinkle updated the task description. (Show Details)Mar 25 2020, 11:11 PM
Krinkle updated the task description. (Show Details)Mar 25 2020, 11:16 PM

I cant tell what kind of madness might occur if we changed the conditional from areAllJobsComplete to didAnyJobFail :-\

Picking this back up again (from T225730#5490156), is it worth a short experiment (24 hours?)? Potentially this would free up a lot of resources in addition to providing quicker feedback.

I cant tell what kind of madness might occur if we changed the conditional from areAllJobsComplete to didAnyJobFail :-\

Picking this back up again (from T225730#5490156), is it worth a short experiment (24 hours?)? Potentially this would free up a lot of resources in addition to providing quicker feedback.

I think it's worth trying. Let's spin this out into a sub-task.

Krinkle added a comment.EditedApr 2 2020, 8:06 PM

Looking at the Test Result from the latest master commit for mediawiki-quibble-vendor-mysql-php72-docker, and the slowest namespaced test case is MediaWiki.Tests.Revision/McrRevisionStoreDbTest, where the slowest tests are testGetRevisionByTimestamp and testLoadRevisionFromTimestamp which both take one full second each.

Not surprising, as they quite literally contains sleep( 1 ). Let's fix that :)

Change 585595 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[mediawiki/core@master] Revision: use MWTimestamp directly instead of wfTimesamp()

https://gerrit.wikimedia.org/r/585595

Looking at the Test Result from the latest master commit for mediawiki-quibble-vendor-mysql-php72-docker, and the slowest namespaced test case is MediaWiki.Tests.Revision/McrRevisionStoreDbTest, where the slowest tests are testGetRevisionByTimestamp and testGetRevisionByTimestamp which both take one full second each.

Not surprising, as they quite literally contains sleep( 1 ). Let's fix that :)

Random idea: what about a PHPCS sniff that suggests MWTimestamp if it finds sleep in a test file?

Change 585599 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[mediawiki/core@master] Revision: Use setFakeTime() instead of sleep() in unit tests

https://gerrit.wikimedia.org/r/585599

Looking at the Test Result from the latest master commit for mediawiki-quibble-vendor-mysql-php72-docker, and the slowest namespaced test case is MediaWiki.Tests.Revision/McrRevisionStoreDbTest, where the slowest tests are testGetRevisionByTimestamp and testGetRevisionByTimestamp which both take one full second each.

Not surprising, as they quite literally contains sleep( 1 ). Let's fix that :)

Random idea: what about a PHPCS sniff that suggests MWTimestamp if it finds sleep in a test file?

Looking at the Test Result from the latest master commit for mediawiki-quibble-vendor-mysql-php72-docker, and the slowest namespaced test case is MediaWiki.Tests.Revision/McrRevisionStoreDbTest, where the slowest tests are testGetRevisionByTimestamp and testGetRevisionByTimestamp which both take one full second each.

Not surprising, as they quite literally contains sleep( 1 ). Let's fix that :)

Random idea: what about a PHPCS sniff that suggests MWTimestamp if it finds sleep in a test file?

A wfTimestamp -> MWTimestamp::convert converter, or at least a warn for uses of wfTimestamp generally would be good. Though it's not yet actually deprecated…

Change 585599 merged by jenkins-bot:
[mediawiki/core@master] Revision: Use setFakeTime() instead of sleep() in unit tests

https://gerrit.wikimedia.org/r/585599

Change 585595 merged by jenkins-bot:
[mediawiki/core@master] Revision: use MWTimestamp directly instead of wfTimesamp()

https://gerrit.wikimedia.org/r/585595

Change 592083 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[mediawiki/core@master] mediawiki.jqueryMsg: Refactor test suite to not make any API requests

https://gerrit.wikimedia.org/r/592083

Krinkle updated the task description. (Show Details)Apr 23 2020, 9:02 PM

Change 592083 merged by jenkins-bot:
[mediawiki/core@master] mediawiki.jqueryMsg: Refactor test suite to not make any API requests

https://gerrit.wikimedia.org/r/592083

Krinkle added a comment.EditedApr 27 2020, 5:54 PM
From task description:

Stats from wmf-quibble-core-vendor-mysql-php72-docker:

  • 9-15 minutes (wmf-gated, extensions-only)
  • Sample:
    • PHPUnit (dbless): 1.91 minutes / 15,782 tests.
    • QUnit: 29 seconds / 1286 tests.
    • Selenium: 143 seconds / 43 tests.
    • PHPUnit (db): 3.85 minutes / 4377 tests.

Looking at a recent build for mediawiki/core (example) I notice a sharp regression in the "PHPUnit (db)" step.

It now takes 8.8 minutes / 4380 tests, with most of the slow ones relating to Wikibase/Scribunto:

You should really speed up these slow tests:
1. 5779ms to run Wikibase\Repo\Tests\Api\EditEntityTest:testEditEntity_producesCorrectSummary with data set "more than 50 languages changed and other parts changed"
2. 4737ms to run Wikibase\Repo\Tests\Api\EditEntityTest:testEditEntity_producesCorrectSummary with data set "more than 50 languages changed"
3. 4144ms to run Wikibase\Repo\Tests\Store\StoreTest:testRebuild with data set #0
4. 3789ms to run Wikibase\Repo\Tests\Api\SetClaimTest:testAddClaim
5. 3425ms to run Wikibase\Client\Tests\Integration\DataAccess\Scribunto\Scribunto_LuaWikibaseLibraryTest:testIncrementStatsKey
6. 3425ms to run Wikibase\Client\Tests\Integration\DataAccess\Scribunto\Scribunto_LuaWikibaseLibraryTest:testLua with data set #87
7. 3391ms to run Wikibase\Client\Tests\Integration\DataAccess\Scribunto\Scribunto_LuaWikibaseLibraryTest:testResolvePropertyId_labelNotFound
8. 3369ms to run Wikibase\Client\Tests\Integration\DataAccess\Scribunto\Scribunto_LuaWikibaseLibraryTest:testLua with data set #7
9. 3148ms to run Wikibase\Client\Tests\Integration\DataAccess\Scribunto\Scribunto_LuaWikibaseLibraryTest:testLua with data set #22
10. 3130ms to run Wikibase\Client\Tests\Integration\DataAccess\Scribunto\Scribunto_LuaWikibaseLibraryTest:testLua with data set #47
11. 3061ms to run Wikibase\Client\Tests\Integration\DataAccess\Scribunto\Scribunto_LuaWikibaseLibraryTest:testLua with data set #30
12. 2960ms to run Wikibase\Repo\Tests\Specials\SpecialNewPropertyTest:testEntityIsBeingCreated_WhenValidInputIsGiven with data set "another language"
13. 2949ms to run Wikibase\Client\Tests\Integration\DataAccess\Scribunto\Scribunto_LuaWikibaseLibraryTest:testLua with data set #15
14. 2944ms to run Wikibase\Client\Tests\Integration\DataAccess\Scribunto\Scribunto_LuaWikibaseLibraryTest:testLua with data set #55
15. 2827ms to run Wikibase\Repo\Tests\Specials\SpecialNewItemTest:testEntityIsBe

@Jdforrester-WMF Are these the ones we wanted to exclude from the shared gate job via standalone?

@Jdforrester-WMF Are these the ones we wanted to exclude from the shared gate job via standalone?

Yes, this is waiting on the next Quibble release to fix.

[mediawiki/extensions/WikibaseCirrusSearch@master] tests: Remove empty placeholder for selenium tests

Given we spend a full minute running through this no-op on every
MW patch and backport, remove it for now.

https://gerrit.wikimedia.org/r/c/mediawiki/extensions/WikibaseCirrusSearch/+/608983