Page MenuHomePhabricator

Reduce runtime of MW shared gate Jenkins jobs to 5 min
Open, MediumPublic

Assigned To
None
Authored By
Krinkle
Jun 13 2019, 3:28 PM
Referenced Files
F37145008: Screenshot 2023-07-21 at 03.06.49.png
Jul 21 2023, 2:08 AM
F37145004: Screenshot 2023-07-21 at 03.02.01.png
Jul 21 2023, 2:08 AM
F30679987: wmf-quibble-core
Oct 14 2019, 6:51 PM
F29513257: Screenshot 2019-06-13 at 15.57.50.png
Jun 13 2019, 3:28 PM
F29513308: Screenshot 2019-06-13 at 15.59.18.png
Jun 13 2019, 3:28 PM
Tokens
"Love" token, awarded by kostajh."Love" token, awarded by Ladsgroup."Like" token, awarded by mmodell."100" token, awarded by DannyS712.

Description

Objective

For the "typical" time it takes for a commit to be approved and landed in master to be 5 minutes or less.

Status quo

As of 11 June 2019, the gate usually takes around 20 minutes.

The two slowest jobs typically take 13-17 minutes each. The time for the gate overall is rarely under 15 minutes, because we run multiple of these jobs (increasing the chances of random slowness), and while they can run in parallel, they don't always start immediately - given limited CI execution slots.

Below a sample from a MediaWiki commit (master branch):

Gate pipeline build succeeded.
  • wmf-quibble-core-vendor-mysql-php72-docker SUCCESS in 12m 03s
  • wmf-quibble-core-vendor-mysql-hhvm-docker SUCCESS in 14m 12s
  • mediawiki-quibble-vendor-mysql-php72-docker SUCCESS in 7m 34s
  • mediawiki-quibble-vendor-mysql-php71-docker SUCCESS in 7m 12s
  • mediawiki-quibble-vendor-mysql-php70-docker SUCCESS in 6m 48s
  • mediawiki-quibble-vendor-mysql-hhvm-docker SUCCESS in 8m 32s
  • mediawiki-quibble-vendor-postgres-php72-docker SUCCESS in 10m 05s
  • mediawiki-quibble-vendor-sqlite-php72-docker SUCCESS in 7m 04s
  • mediawiki-quibble-composer-mysql-php70-docker SUCCESS in 8m 14s

(+ jobs that take less than 3 minutes: composer-test, npm-test, and phan.)

These can be grouped in two kinds of jobs:

  • wmf-quibble: These install MW with the gated extensions, and then run all PHPUnit, Selenium and QUnit tests.
  • mediawiki-quibble: These install MW bundled extensions only, and then run PHPUnit, Selenium and QUnit tests.

Stats from wmf-quibble-core-vendor-mysql-php72-docker

  • 9-15 minutes (wmf-gated, extensions-only)
  • Sample:
    • PHPUnit (dbless): 1.91 minutes / 15,782 tests.
    • QUnit: 29 seconds / 1286 tests.
    • Selenium: 143 seconds / 43 tests.
    • PHPUnit (db): 3.85 minutes / 4377 tests.

Stats from mediawiki-quibble-vendor-mysql-php72-docker:

  • 7-10 minutes (plain mediawiki-core)
  • Sample:
    • PHPUnit (unit+dbless): 1.5 minutes / 23,050 tests.
    • QUnit: 4 seconds / 437 tests.
    • PHPUnit (db): 4 minutes / 7604 tests.

Updated status quo

As of 11 May 2021, the gate usually takes around 25 minutes.

The slowest job typically takes 20-25 minutes per run. The time for the gate overall can never be faster than the slowest job, and can be worse as though we run other jobs in parallel, they don't always start immediately, due to given limited CI execution slots.

Below is the time results from a sample MediaWiki commit (master branch):

[Snipped: Jobs faster than 5 minutes]

  • 9m 43s: mediawiki-quibble-vendor-mysql-php74-docker/5873/console
  • 9m 47s: mediawiki-quibble-vendor-mysql-php73-docker/8799/console
  • 10m 03s: mediawiki-quibble-vendor-sqlite-php72-docker/10345/console
  • 10m 13s: mediawiki-quibble-composer-mysql-php72-docker/19129/console
  • 10m 28s: mediawiki-quibble-vendor-mysql-php72-docker/46482/console
  • 13m 11s: mediawiki-quibble-vendor-postgres-php72-docker/10259/console
  • 16m 44s: wmf-quibble-core-vendor-mysql-php72-docker/53990/console
  • 22m 26s: wmf-quibble-selenium-php72-docker/94038/console

Clearly the last two jobs are dominant in the timing:

  • wmf-quibble: This jobs installs MW with the gated extensions, and then runs all PHPUnit and QUnit tests.
  • wmf-quibble-selenium: This job installs MW with the gated extensions, and then runs all the Selenium tests.

Note that the mediawiki-quibble jobs each install just the MW bundled extensions, and then run PHPUnit, Selenium and QUnit tests.

Stats from wmf-quibble-core-vendor-mysql-php72-docker:

  • 13-18 minutes (wmf-gated, extensions-only)
  • Select times:
    • PHPUnit (unit tests): 9 seconds / 13,170 tests.
    • PHPUnit (DB-less integration tests): 3.31 minutes / 21,067 tests.
    • PHPUnit (DB-heavy): 7.91 minutes / 4,257 tests.
    • QUnit: 31 seconds / 1421 tests.

Stats from wmf-quibble-selenium-php72-docker:

  • 20-25 minutes

Scope of task

This task represents the goal of reaching 5 minutes or less. The work tracked here includes researching ways to get there, trying them out, and putting one or more ideas into practice. The task can be closed once we have reached the goal or if we have concluded it isn't feasible or useful.

Feel free to add/remove subtasks as we go along and consider different things.

Stuff done

Ideas to explore and related work

  • Look at the PHPUnit "Test Report" for a commit and sort the root by duration. Find the slowest ones and look at its test suite to look for ways to improve it. Is it repeating expensive setups? Perhaps that can be skipped or re-used. Is it running hundreds of variations for the same integration test? Perhaps reduce it to just 1 case for that story, and apply the remaining cases to a lighter unit test instead.

Details

SubjectRepoBranchLines +/-
mediawiki/extensions/Echomaster+6 -18
mediawiki/extensions/Echomaster+16 -33
mediawiki/coremaster+14 -5
mediawiki/coremaster+30 -4
mediawiki/libs/less.phpmaster+3 -3
mediawiki/coremaster+14 -0
mediawiki/coremaster+22 -25
mediawiki/libs/less.phpmaster+39 -43
mediawiki/coremaster+11 -8
mediawiki/extensions/Kartographermaster+35 -27
mediawiki/extensions/Translatemaster+7 -4
mediawiki/libs/less.phpmaster+11 -6
mediawiki/coremaster+30 -30
mediawiki/coremaster+5 -4
mediawiki/coremaster+23 -22
mediawiki/coremaster+52 -55
integration/quibblemaster+24 -2
mediawiki/coremaster+3 -3
mediawiki/coremaster+2 -28
mediawiki/extensions/MobileFrontendmaster+0 -3
integration/configmaster+14 -0
integration/configmaster+0 -4
mediawiki/coremaster+0 -5
integration/quibblemaster+40 -15
integration/quibblemaster+109 -0
mediawiki/coremaster+56 -16
mediawiki/coremaster+2 -12
mediawiki/coremaster+8 -3
integration/quibblemaster+34 -0
mediawiki/coremaster+2 -12
integration/configmaster+1 -1
integration/configmaster+32 -32
integration/configmaster+28 -28
integration/quibblemaster+27 -3
mediawiki/extensions/ProofreadPagemaster+11 K -621
mediawiki/extensions/GrowthExperimentsmaster+13 K -549
mediawiki/extensions/Echomaster+11 K -216
mediawiki/extensions/AbuseFiltermaster+12 K -469
mediawiki/extensions/FileImportermaster+11 K -530
mediawiki/coremaster+22 -140
mediawiki/coremaster+1 K -679
mediawiki/coremaster+21 -14
mediawiki/coremaster+6 -4
mediawiki/coremaster+1 -1
integration/configmaster+0 -19
integration/configmaster+12 -5
mediawiki/coremaster+20 -19
mediawiki/coremaster+17 -49
mediawiki/coremaster+1 -3
mediawiki/coremaster+13 -1
integration/configmaster+22 -22
integration/configmaster+54 -0
integration/quibblemaster+4 -0
mediawiki/coremaster+3 -10
mediawiki/coremaster+1 -1
mediawiki/extensions/Wikibasemaster+70 -21
mediawiki/coremaster+27 -37
mediawiki/coremaster+29 -5
mediawiki/coremaster+37 -1
mediawiki/extensions/Babelmaster+47 -52
mediawiki/coremaster+16 -1
Show related patches Customize query in gerrit

Related Objects

View Standalone Graph
This task is connected to more than 200 other tasks. Only direct parents and subtasks are shown here. Use View Standalone Graph to show more of the graph.
StatusSubtypeAssignedTask
OpenNone
ResolvedLadsgroup
Resolvedaaron
Resolvedhashar
Resolvedhashar
OpenNone
Resolvedhashar
DeclinedNone
ResolvedJdforrester-WMF
Resolvedawight
OpenNone
ResolvedNone
ResolvedDaimona
OpenNone
DuplicateNone
OpenNone
OpenNone
OpenNone
OpenNone
OpenNone
OpenNone
OpenNone
OpenNone
OpenNone
ResolvedDaimona
ResolvedDaimona

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Change 777006 had a related patch set uploaded (by Krinkle; author: Aaron Schulz):

[mediawiki/core@master] phpunit: Set $wgSQLMode from DevelopmentSettings instead of MediaWikiIntegrationTestCase (ii)

https://gerrit.wikimedia.org/r/777006

Change 777006 merged by jenkins-bot:

[mediawiki/core@master] phpunit: Set $wgSQLMode from DevelopmentSettings instead of MediaWikiIntegrationTestCase (ii)

https://gerrit.wikimedia.org/r/777006

Change 777482 had a related patch set uploaded (by Krinkle; author: Krinkle):

[mediawiki/core@master] debug: Fix $wgDebugRawPage to work with PSR-3 debug logging

https://gerrit.wikimedia.org/r/777482

Change 777482 merged by jenkins-bot:

[mediawiki/core@master] debug: Fix $wgDebugRawPage to work with PSR-3 debug logging

https://gerrit.wikimedia.org/r/777482

This is really getting frustrating for the wmf branches. E.g. gerrit 785944 spent an hour in CI, then errored out with Build timed out (after 60 minutes). Marking the build as failed.. gerrit 785941 took 92 minutes to merge. It's basically getting impossible to do an extension backport within the one-hour deploy window; not to mention multiple backports.

Can we just drop Selenium tests from gate-and-submit-wmf? I don't think they add any real value, by the point a patch gets there it typically passed Selenium tests in test, gate-and-submit, test-wmf.

I don't think we should run Selenium on anything other than the master branch tbh.

This is really getting frustrating for the wmf branches. E.g. gerrit 785944 spent an hour in CI, then errored out with Build timed out (after 60 minutes). Marking the build as failed.. gerrit 785941 took 92 minutes to merge. It's basically getting impossible to do an extension backport within the one-hour deploy window; not to mention multiple backports.

Can we just drop Selenium tests from gate-and-submit-wmf? I don't think they add any real value, by the point a patch gets there it typically passed Selenium tests in test, gate-and-submit, test-wmf.

Filed as T307180: Drop Selenium tests from gate-and-submit-wmf, +1 from me.

This is really getting frustrating for the wmf branches. E.g. gerrit 785944 spent an hour in CI, then errored out with Build timed out (after 60 minutes). Marking the build as failed.. gerrit 785941 took 92 minutes to merge. It's basically getting impossible to do an extension backport within the one-hour deploy window; not to mention multiple backports.

If I remember properly, those build froze at npm ci step until Jenkins times out the build and kill it. We can probably lower the build timeout in Jenkins and find a way for Quibble to enforce a timeout to npm ci operation. That would at least make them fails faster.

As for the issue, that seems to match a npmjs issue https://status.npmjs.org/incidents/ljzb0hdg8zr3 :

There was an intermittent issue with package install since 23rd April which is now resolved.
Posted Apr 26, 2022 - 14:30 UTC

In T304114 I was made aware that we're running Minerva Selenium tests on Extension:StopForumSpam which is not even run in production (Beta cluster only as far as I can tell) and never interacts with Minerva.

It also seems we run Minerva browser tests in Vector despite the fact it's impossible for a Vector patch to break Minerva as these are mutually exclusive experiences.

It seems we could save a chunk of time by restricting the extensions where we run our browser tests. The overlap between skins and extensions from the perspective of browser tests is very small.
Is there a way to be more deliberate about where we run which Selenium tests? As far as I can see, Minerva browser tests only need to be run with core, MobileFrontend, VisualEditor and Echo changes.

Change 745991 abandoned by Ladsgroup:

[mediawiki/core@master] [DNM] Test if reduction in logging would reduce the tests run time

Reason:

test done

https://gerrit.wikimedia.org/r/745991

Change 822663 had a related patch set uploaded (by Krinkle; author: Krinkle):

[integration/config@master] jjb: Remove 'compress junit' postbuildscript step

https://gerrit.wikimedia.org/r/822663

Change 822663 merged by jenkins-bot:

[integration/config@master] jjb: [quibble] Remove 'compress junit' postbuildscript step

https://gerrit.wikimedia.org/r/822663

Change 748312 merged by jenkins-bot:

[integration/config@master] dockerfiles: Add php-excimer to quibble

https://gerrit.wikimedia.org/r/748312

Change 853292 had a related patch set uploaded (by Kosta Harlan; author: Kosta Harlan):

[mediawiki/extensions/MobileFrontend@master] Remove selenium entries from package.json

https://gerrit.wikimedia.org/r/853292

Three years later: maybe we should target a more realistic number, like 15 minutes? If we can get to 15 minutes, we could then move on to 10 minutes, and then maybe end up back at this task's current goal of 5 minutes.

T287582: Move some Wikibase selenium tests to a standalone job is a potential low hanging fruit. The Wikibase Selenium tests takes roughly 5minutes30 and only depends on MinervaNeue, MobileFrontend and UniversalLanguageSelector (and MediaWiki core obviously). If we moved those to a standalone job only triggering for those few repositories, that will save 5minutes30 for all other repositories. We would have to drop the npm selenium-test entry point from Wikibase to prevent it from being discovered when Wikibase is a dependency.

Another potentially large saver would be to mark slow tests with @group Standalone which would only trigger when a patchset target that repo and thus prevent them from running when the patchset is for another repository.

@hashar Afaik for PHPUnit we only install (and thus test) dependency extensions, based on the dependency map in CI config. Is this not the case for the Selenium job? If there's a default list applied also, perhaps we can opt-out from that for the Selenium job.

Change 853292 merged by jenkins-bot:

[mediawiki/extensions/MobileFrontend@master] Remove selenium entries from package.json

https://gerrit.wikimedia.org/r/853292

Change 859146 had a related patch set uploaded (by Krinkle; author: Tim Starling):

[mediawiki/core@master] Reduce time cost of password tests

https://gerrit.wikimedia.org/r/859146

Change 859146 merged by jenkins-bot:

[mediawiki/core@master] password: Reduce time cost of password unit tests

https://gerrit.wikimedia.org/r/859146

@hashar Afaik for PHPUnit we only install (and thus test) dependency extensions, based on the dependency map in CI config. Is this not the case for the Selenium job? If there's a default list applied also, perhaps we can opt-out from that for the Selenium job.

The PHPUnit and Selenium kind jobs are alike: they both use Quibble as test runner and both have extensions/skins dependencies injected from CI. The only difference is the kind of tests being run which are filtered via quibble parameters, respectively: --skip=selenium and --run=selenium.

The registration or discovery of tests varies though. A detailed description (you probably know most of that already but that can be helpful for other readers):

PHPUnit

We run the MediaWiki core extensions suite (defined at tests/phpunit/suites/ExtensionsTestSuite.php) which:

  • discover tests if they are put in a /tests/phpunit directory relatively to each extension/skin.
  • add tests registered via the UnitTestsList hook

Those tests are depending on the centrally managed configuration from mediawiki/core eg the phpunit version.

Selenium

There is no registration mechanism, only a discovery phase. Each extension/skin test suite is standalone and can use different versions of Webdriver.io. The convention is developers define how to run the browser tests by adding a selenium-test npm script. In those jobs quibble --run=selenium has all extensions/skins cloned in and checked out, Quibble then crawls through each repo looking for a package.json and if has a selenium-test it will run those tests invoking npm run selenium. They are run serially.


The alternative would be to only run the Selenium test for the triggering repository. Something such as: quibble --command 'cd $THING_NAME && npm run-script selenium-test. But we will loose the integration testing between repository :-(

For PHPUnit test we have moved some tests to @group Standalone. The use case was to avoid running the Scribunto integration tests for any extension depending on it, given those tests are only going to be affected by a change to Scribunto there was no point in running them when a patch triggers Wikibase for example. It is another build stage defined in Quibble: `quibble --run phpunit-standalone.

I guess we could do something similar for Selenium and split tests between:

  • integration testing (to be run by any repo)
  • standalone tests (to be run solely when a patch triggers for this repo)
  • add a selenium-standalone npm script convention which in the repo would invoke something such as wdio --spec tests/selenium/specs/standalone/**/*.js

The Wikibase Selenium tests […] only depends on MinervaNeue, MobileFrontend and UniversalLanguageSelector […]
! In T225730#8370890, @Krinkle wrote:

@hashar Afaik for PHPUnit we only install (and thus test) dependency extensions, based on the dependency map in CI config. Is this not the case for the Selenium job? […]

The PHPUnit and Selenium kind jobs are alike: they both use Quibble as test runner and both have extensions/skins dependencies injected from CI. […] The registration or discovery of tests varies though. […] Selenium: There is no registration mechanism, only a discovery phase. […]

My reason for asking is that I thought you meant that Wikibase CI is slow because it is running selenium tests for extensions that it does not depend on. The lack of a test registration system for Selenium should not be an issue since we can only discover what we install in CI, and CI only installs what Wikibase depends on according to the repo dependency map.

For PHPUnit test we have moved some tests to @group Standalone. […] I guess we could do something similar for Selenium and split tests, [and] add a selenium-standalone npm script convention.

Another option might be to filter out @standalone tests via wdio --mochaOpts.grep --invert. This is even more similar to PHPUnit and is what we already use for running the @daily selenium tests. This has the benefit of keeping a simple and consistent way to run selenium tests on any given repo as a contributor without needing to know about this detail. It is then to Quibble to invoke a variant npm run selenium-nostandalone on non-current repos to skip those. This instead of e.g. having the main selenium entrypoint no longe run all the tests, which might lead to confusion.

zeljkofilipin changed the status of subtask T284568: Speed up login from In Progress to Open.Dec 15 2022, 11:52 AM

Change 913562 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[mediawiki/core@master] [DNM] Measure impact of GeneralizedSql on tests run time

https://gerrit.wikimedia.org/r/913562

Change 913562 abandoned by Ladsgroup:

[mediawiki/core@master] [DNM] Measure impact of GeneralizedSql on tests run time

Reason:

https://gerrit.wikimedia.org/r/913562

Change 702909 abandoned by Hashar:

[integration/quibble@master] npm: Use cache for npm ci and prefer offline

Reason:

https://gerrit.wikimedia.org/r/702909

Change 932455 had a related patch set uploaded (by Umherirrender; author: Umherirrender):

[mediawiki/core@master] api tests: Call editPage() with WikiPage when used for same page

https://gerrit.wikimedia.org/r/932455

Change 932455 merged by jenkins-bot:

[mediawiki/core@master] api tests: Call editPage() with WikiPage when used for same page

https://gerrit.wikimedia.org/r/932455

Change 936317 had a related patch set uploaded (by Umherirrender; author: Umherirrender):

[mediawiki/core@master] storage tests: Call editPage() with WikiPage when used for same page

https://gerrit.wikimedia.org/r/936317

Change 936230 had a related patch set uploaded (by Thiemo Kreuz (WMDE); author: Thiemo Kreuz (WMDE)):

[mediawiki/core@master] Fix TestLocalisationCache being way to small

https://gerrit.wikimedia.org/r/936230

Change 936317 merged by jenkins-bot:

[mediawiki/core@master] storage tests: Call editPage() with WikiPage when used for same page

https://gerrit.wikimedia.org/r/936317

Change 936230 merged by jenkins-bot:

[mediawiki/core@master] Fix TestLocalisationCache being way to small

https://gerrit.wikimedia.org/r/936230

Change 938346 had a related patch set uploaded (by Umherirrender; author: Umherirrender):

[mediawiki/core@master] tests: Pass Title to editPage() when already parsed

https://gerrit.wikimedia.org/r/938346

Change 938346 merged by jenkins-bot:

[mediawiki/core@master] tests: Pass Title to editPage() when already parsed

https://gerrit.wikimedia.org/r/938346

Change 939296 had a related patch set uploaded (by Thiemo Kreuz (WMDE); author: Thiemo Kreuz (WMDE)):

[mediawiki/libs/less.php@master] Re-arrange execution order in LESS parser for performance

https://gerrit.wikimedia.org/r/939296

Change 939684 had a related patch set uploaded (by Thiemo Kreuz (WMDE); author: Thiemo Kreuz (WMDE)):

[mediawiki/extensions/Kartographer@master] Much faster tests by setting content language to qqx

https://gerrit.wikimedia.org/r/939684

Change 939690 had a related patch set uploaded (by Thiemo Kreuz (WMDE); author: Thiemo Kreuz (WMDE)):

[mediawiki/extensions/Translate@master] [POC] Set content language to qqx in slow tests

https://gerrit.wikimedia.org/r/939690

Change 939296 abandoned by Thiemo Kreuz (WMDE):

[mediawiki/libs/less.php@master] Re-arrange execution order in LESS parser for performance

Reason:

https://gerrit.wikimedia.org/r/939296

Change 939727 had a related patch set uploaded (by Thiemo Kreuz (WMDE); author: Thiemo Kreuz (WMDE)):

[mediawiki/libs/less.php@master] Inline and optimize heavily called Parser::MatchQuoted

https://gerrit.wikimedia.org/r/939727

Change 939690 abandoned by Thiemo Kreuz (WMDE):

[mediawiki/extensions/Translate@master] [POC] Set content language to qqx in slow tests

Reason:

No, that's not it.

https://gerrit.wikimedia.org/r/939690

Change 939717 had a related patch set uploaded (by Krinkle; author: Thiemo Kreuz (WMDE)):

[mediawiki/libs/less.php@master] Less_Parser: Faster MatchQuoted() by using native strcspn()

https://gerrit.wikimedia.org/r/939717

Change 939717 merged by jenkins-bot:

[mediawiki/libs/less.php@master] Less_Parser: Faster MatchQuoted() by using native strcspn()

https://gerrit.wikimedia.org/r/939717

Change 939684 merged by jenkins-bot:

[mediawiki/extensions/Kartographer@master] Much faster tests by setting content language to qqx

https://gerrit.wikimedia.org/r/939684

Change 939684 merged by jenkins-bot:

[mediawiki/extensions/Kartographer@master] Much faster tests by setting content language to qqx

https://gerrit.wikimedia.org/r/939684

From my review on that change:

Think the root cause is the localization cache is set to use a static array from includes/DevelopmentSettings.php:

// Localisation Cache to StaticArray (T218207)
$wgLocalisationCacheConf['store'] = 'array';

For PHPUnit tests, it is probably fine since there is a single process. But for Selenium tests, that means each web request has to regenerate the localization cache which would explain the slowdown?

for Selenium tests, that means each web request has to regenerate the localization cache which would explain the slowdown?

When I look at the strange, sometimes extreme runtimes of ResourcesTest and SpecialPageFatalTest I suspect something similar happens in other places. I mean, sure, executing special pages, parsing .less files, or initializing LocalisationCaches is sometimes just very complicated and expensive. But even the most expensive of these examples consumes only a few 100 milliseconds. That's not a problem.

But it is a problem if it's done thousands of times for tests that don't even do anything with the results.

I'm digging into this for a few days now but can't find that single bottleneck anywhere. I suspect the majority of the up to 20 minutes this ticket talks about is just because we are naively running the same processes with the same inputs and the same outputs over and over again, millions and millions of times.

for Selenium tests, that means each web request has to regenerate the localization cache which would explain the slowdown?

When I look at the strange, sometimes extreme runtimes of ResourcesTest and SpecialPageFatalTest I suspect something similar happens in other places. I mean, sure, executing special pages, parsing .less files, or initializing LocalisationCaches is sometimes just very complicated and expensive. But even the most expensive of these examples consumes only a few 100 milliseconds. That's not a problem.

T50217#9030084 is related. I noticed that paratest was hanging, and spending most of the time (12 of 18 seconds) on a single test, LanguageConverterFactoryTest. I looked at that test yesterday, it basically loads the localisation cache for all languages, one by one. And while loading a single language isn't terribly expensive (the slowest I saw was around 80ms), it adds up when done hundreds of times. I couldn't find a solution to that, though, and just stopped trying after a while.

Change 940120 had a related patch set uploaded (by Thiemo Kreuz (WMDE); author: Thiemo Kreuz (WMDE)):

[mediawiki/core@master] Make MapCacheLRU in LanguageFactory static

https://gerrit.wikimedia.org/r/940120

Change 940233 had a related patch set uploaded (by Daimona Eaytoy; author: Daimona Eaytoy):

[mediawiki/core@master] TestLocalisationCache: add static cache of JSON files

https://gerrit.wikimedia.org/r/940233

In order to help understand what's going on under the hood and make more informed decisions on what to improve and what don't. I run tests with excimer and built flamegraphs out of them which you can see for three types of tests (phpunit, selenium, api_tests) here: https://people.wikimedia.org/~ladsgroup/tests_flamegraphs/

The flamegraphs can be searched, zoomed in, etc. (More info: https://www.brendangregg.com/flamegraphs.html)

That provides way better insights and are a treasure trope of areas to improve. I haven't fully dug into all of them but for example: 3% of phpunit tests is being spent on resetting database between each test (MediaWikiIntegrationTestCase::resetDB) which makes some sense but almost all of it is being spent in creating pages and edits(!), why? because it internally calls MediaWikiIntegrationTestCase::addCoreDBData which in turns calls WikiPage::doUserEditContent and makes a page and edits in every integration test(!!!!!)

Or for selenium tests: The biggest bottleneck is minification and RL modules access, I guess it's not properly cached. It can easily brush off a minute from tests.

Feel free to dig into them!

(How it was made? this patch enables logging of excimer and tests create two files, one for wall time and one for cpu time with sampling rate of every .1 second. Then downloaded those logs and run this on the logs: cat ../selenium/cpu-profile.log* | ./flamegraph.pl > ../selenium/cpu-profile.svg; cat ../selenium/wall-profile.log* | ./flamegraph.pl > ../selenium/wall-profile.svg; cat ../selenium/cpu-profile.log* | ./flamegraph.pl --reverse --colors blue > ../selenium/cpu-profile.reverse.svg; cat ../selenium/wall-profile.log* | ./flamegraph.pl --reverse --colors blue > ../selenium/wall-profile.reverse.svg)

You can also take the setting of the patch and enable it locally and run them.

3% of phpunit tests is being spent on resetting database between each test (MediaWikiIntegrationTestCase::resetDB) which makes some sense but almost all of it is being spent in creating pages and edits(!), why? because it internally calls MediaWikiIntegrationTestCase::addCoreDBData which in turns calls WikiPage::doUserEditContent and makes a page and edits in every integration test(!!!!!)

That's something I'd like to get rid of. But first of all, we should avoid cloning and resetting the database for tests not in the 'Database' group (T155147). Once that's done, I'll look into not creating a page and a user for every test.

Zooming out a little bit, PHPUnit (and possibly selenium too, but I haven't tested it) would benefit a lot from parallelization with paratest, see T50217#9022060 and the following comments. I'm a bit skeptical about seeing a x6 speedup in CI as I observed locally; but still, even if it's just a 2x, it's much more than we can do by just improving the code. I'm also planning to look into paratest support once T155147 and T90875 are done.

The one single test for which we should really do something is LanguageConverterFactoryTest. The test instantiates something like 400 language objects, which is really slow. The whole test class takes roughly 13 seconds to run. Running the whole "includes" suite with paratest takes around 19 seconds, and because LanguageConverterFactoryTest is a single test handled by a single process, its 13 seconds mean a lot for the overall runtime. I was looking for ways to speed it up, but I doubt we'll be able to do much. Another option would be to split it into multiple test classes, each one running the same test for a subset of the languages. But that's also something I will look into more closely once we're in a better position to evaluate paratest.

As requested by @Krinkle I increased the sampling rate from every 100m to 1ms. The result are here https://people.wikimedia.org/~ladsgroup/tests_flamegraphs/high_res/

The one single test for which we should really do something is LanguageConverterFactoryTest. The test instantiates something like 400 language objects, which is really slow. The whole test class takes roughly 13 seconds to run. Running the whole "includes" suite with paratest takes around 19 seconds, and because LanguageConverterFactoryTest is a single test handled by a single process, its 13 seconds mean a lot for the overall runtime. I was looking for ways to speed it up, but I doubt we'll be able to do much. Another option would be to split it into multiple test classes, each one running the same test for a subset of the languages. But that's also something I will look into more closely once we're in a better position to evaluate paratest.

13 seconds is not much compared to at 1-2 minutes of every selenium test being spent on cache misses of RL modules: https://people.wikimedia.org/~ladsgroup/tests_flamegraphs/high_res/wall-profile.selenium.svg

Or for selenium tests: The biggest bottleneck is minification and RL modules access, I guess it's not properly cached. It can easily brush off a minute from tests.
[…]

When I download the .log artefacts and load them up into Speedscope, I can confirm that APCUBagOStuff is selected (so php-apcu cache is enabled and discovered by MW), and we see both doGet calls without minification, and minification calls with doSet. Suggesting that functionally it is using the cache correctly.

It's had to tell from the current snapshot whether it is slow or whether it is missing the cache more than once for the same module. The settings here are more or less stock MediaWiki + DevelopmentSettings, similar to mediawiki-docker where this is generally working well.

My first hunch here is that the CI worker is under memory pressure and forced to evict fresh values from the cache before they can be used. For example, the default apc.shm_size in PHP is 32M (https://www.php.net/manual/en/apcu.configuration.php) which is quite small but may just about suffice for human use in MediaWiki-Docker, but in CI we're likely over a threshold where we engage with a lot of different areas. Likewise, opcache (https://www.php.net/manual/en/opcache.configuration.php) has a default of opcache.memory_consumption=128 (MB) and opcache.interned_strings_buffer=8 (MB).

In production (Codesearch) we use:

# appserver
profile::mediawiki::apc_shm_size: 6096M
opcache.interned_strings_buffer: 96
opcache.memory_consumption: 1024

# jobrunner
profile::mediawiki::apc_shm_size: 4096M
opcache.interned_strings_buffer: 96
opcache.memory_consumption: 1024

# default (unused?)
profile::mediawiki::apc_shm_size: 128M
opcache.interned_strings_buffer: 50
opcache.memory_consumption: 300

We don't need that much in CI given that we only interact with 1 wiki (not 1000), and in ~1 language (not 300). It currently has the following set in Quibble (source):

opcache.memory_consumption=256

I presume the others inherit upstream defaults of apc.shm_size=32M and opcache.interned_strings_buffer=8, which would be very small indeed.

I've uploaded the raw log files from @Ladsgroup's high-res exicmer capture (change 940226) to people.wikimedia.org, which has CORS enabled. This means we can load them into Speedscope (aka Excimer UI).

https://people.wikimedia.org/~krinkle/T225730_5min_jenkins-excimer-20230720/

For example (this is a 35MB large file): https://performance.wikimedia.org/excimer/speedscope/#profileURL=https://people.wikimedia.org/~krinkle/T225730_5min_jenkins-excimer-20230720/phpunit_mwquibble_cpuprofile.log.

Screenshot 2023-07-21 at 03.02.01.png (1×2 px, 403 KB)

The sample count reflects time passing in milliseconds. It says the phpunit invocation for mediawiki-quibble-vendor yielded 374,000 samples, or 374 seconds, or 6 minutes, which is right.

Using the "Sandwhich" mode, which is like a more powerful version of classic "reverse" flamegraphs, we can quickly find (for example) the LocalisationCache as having one of the highest "self" times. Clicking it, then reveals which tests it is largely spent by, with most (57%) coming from LanguageConverterFactoryTest.

Screenshot 2023-07-21 at 03.06.49.png (973×2 px, 404 KB)

The one single test for which we should really do something is LanguageConverterFactoryTest. The test instantiates something like 400 language objects, which is really slow. The whole test class takes roughly 13 seconds to run.

I incidentally happened to look into language object instantiation yesterday for another reason: POC: Create Language objects without initializing l10n cache – I’ll probably look into this a bit more. (Edit: now tracked at T342418.)

Change 940502 had a related patch set uploaded (by Thiemo Kreuz (WMDE); author: Thiemo Kreuz (WMDE)):

[mediawiki/core@master] Improve performance of LocalisationCache::mergeMagicWords()

https://gerrit.wikimedia.org/r/940502

Change 939727 merged by jenkins-bot:

[mediawiki/libs/less.php@master] Less_Parser: Inline and optimize heavily called MatchQuoted()

https://gerrit.wikimedia.org/r/939727

Change 940502 merged by jenkins-bot:

[mediawiki/core@master] Improve performance of LocalisationCache::mergeMagicWords()

https://gerrit.wikimedia.org/r/940502

Change 940929 had a related patch set uploaded (by Thiemo Kreuz (WMDE); author: Thiemo Kreuz (WMDE)):

[mediawiki/core@master] Make "pluralRules" caches static in LocalisationCache

https://gerrit.wikimedia.org/r/940929

Change 940929 merged by jenkins-bot:

[mediawiki/core@master] Make "pluralRules" caches static in LocalisationCache

https://gerrit.wikimedia.org/r/940929

If we can just fix the RL's caching in the apache server of selenium. We can easily brush off 1-5 minutes (!) from each selenium run. See my attempt in https://gerrit.wikimedia.org/r/c/mediawiki/core/+/944984

Change 698467 abandoned by Kosta Harlan:

[mediawiki/extensions/Echo@master] selenium: Use a single login for browser tests

Reason:

https://gerrit.wikimedia.org/r/698467

Change 698468 abandoned by Kosta Harlan:

[mediawiki/extensions/Echo@master] selenium: Move notifications page test into general suite

Reason:

https://gerrit.wikimedia.org/r/698468