Page MenuHomePhabricator

Reduce runtime of MW shared gate Jenkins jobs to 5 min
Open, MediumPublic

Description

Objective

For the "typical" time it takes for a commit to be approved and landed in master to be 5 minutes or less.

Status quo

As of 11 June 2019, the gate usually takes around 20 minutes.

The two slowest jobs typically take 13-17 minutes each. The time for the gate overall is rarely under 15 minutes, because we run multiple of these jobs (increasing the chances of random slowness), and while they can run in parallel, they don't always start immediately - given limited CI execution slots.

Below a sample from a MediaWiki commit (master branch):

Gate pipeline build succeeded.
  • wmf-quibble-core-vendor-mysql-php72-docker SUCCESS in 12m 03s
  • wmf-quibble-core-vendor-mysql-hhvm-docker SUCCESS in 14m 12s
  • mediawiki-quibble-vendor-mysql-php72-docker SUCCESS in 7m 34s
  • mediawiki-quibble-vendor-mysql-php71-docker SUCCESS in 7m 12s
  • mediawiki-quibble-vendor-mysql-php70-docker SUCCESS in 6m 48s
  • mediawiki-quibble-vendor-mysql-hhvm-docker SUCCESS in 8m 32s
  • mediawiki-quibble-vendor-postgres-php72-docker SUCCESS in 10m 05s
  • mediawiki-quibble-vendor-sqlite-php72-docker SUCCESS in 7m 04s
  • mediawiki-quibble-composer-mysql-php70-docker SUCCESS in 8m 14s

(+ jobs that take less than 3 minutes: composer-test, npm-test, and phan.)

These can be grouped in two kinds of jobs:

  • wmf-quibble: These install MW with the gated extensions, and then run all PHPUnit, Selenium and QUnit tests.
  • mediawiki-quibble: These install MW bundled extensions only, and then run PHPUnit, Selenium and QUnit tests.

Stats from wmf-quibble-core-vendor-mysql-php72-docker

  • 9-15 minutes (wmf-gated, extensions-only)
  • Sample:
    • PHPUnit (dbless): 1.91 minutes / 15,782 tests.
    • QUnit: 29 seconds / 1286 tests.
    • Selenium: 143 seconds / 43 tests.
    • PHPUnit (db): 3.85 minutes / 4377 tests.

Stats from mediawiki-quibble-vendor-mysql-php72-docker:

  • 7-10 minutes (plain mediawiki-core)
  • Sample:
    • PHPUnit (unit+dbless): 1.5 minutes / 23,050 tests.
    • QUnit: 4 seconds / 437 tests.
    • PHPUnit (db): 4 minutes / 7604 tests.

Updated status quo

As of 11 May 2021, the gate usually takes around 25 minutes.

The slowest job typically takes 20-25 minutes per run. The time for the gate overall can never be faster than the slowest job, and can be worse as though we run other jobs in parallel, they don't always start immediately, due to given limited CI execution slots.

Below is the time results from a sample MediaWiki commit (master branch):

[Snipped: Jobs faster than 5 minutes]

  • 9m 43s: mediawiki-quibble-vendor-mysql-php74-docker/5873/console
  • 9m 47s: mediawiki-quibble-vendor-mysql-php73-docker/8799/console
  • 10m 03s: mediawiki-quibble-vendor-sqlite-php72-docker/10345/console
  • 10m 13s: mediawiki-quibble-composer-mysql-php72-docker/19129/console
  • 10m 28s: mediawiki-quibble-vendor-mysql-php72-docker/46482/console
  • 13m 11s: mediawiki-quibble-vendor-postgres-php72-docker/10259/console
  • 16m 44s: wmf-quibble-core-vendor-mysql-php72-docker/53990/console
  • 22m 26s: wmf-quibble-selenium-php72-docker/94038/console

Clearly the last two jobs are dominant in the timing:

  • wmf-quibble: This jobs installs MW with the gated extensions, and then runs all PHPUnit and QUnit tests.
  • wmf-quibble-selenium: This job installs MW with the gated extensions, and then runs all the Selenium tests.

Note that the mediawiki-quibble jobs each install just the MW bundled extensions, and then run PHPUnit, Selenium and QUnit tests.

Stats from wmf-quibble-core-vendor-mysql-php72-docker:

  • 13-18 minutes (wmf-gated, extensions-only)
  • Select times:
    • PHPUnit (unit tests): 9 seconds / 13,170 tests.
    • PHPUnit (DB-less integration tests): 3.31 minutes / 21,067 tests.
    • PHPUnit (DB-heavy): 7.91 minutes / 4,257 tests.
    • QUnit: 31 seconds / 1421 tests.

Stats from wmf-quibble-selenium-php72-docker:

  • 20-25 minutes

Scope of task

This task represents the goal of reaching 5 minutes or less. The work tracked here includes researching ways to get there, trying them out, and putting one or more ideas into practice. The task can be closed once we have reached the goal or if we have concluded it isn't feasible or useful.

Feel free to add/remove subtasks as we go along and consider different things.

Stuff done

Ideas to explore and related work

  • Look at the PHPUnit "Test Report" for a commit and sort the root by duration. Find the slowest ones and look at its test suite to look for ways to improve it. Is it repeating expensive setups? Perhaps that can be skipped or re-used. Is it running hundreds of variations for the same integration test? Perhaps reduce it to just 1 case for that story, and apply the remaining cases to a lighter unit test instead.

Details

Related Changes in Gerrit:
SubjectRepoBranchLines +/-
mediawiki/coreREL1_44+1 -1
mediawiki/coreREL1_43+1 -1
mediawiki/coreREL1_39+4 -1
mediawiki/coremaster+1 -1
mediawiki/coremaster+14 -5
mediawiki/coremaster+6 -6
mediawiki/coremaster+41 -19
integration/quibblemaster+40 -15
mediawiki/coremaster+6 -0
mediawiki/coremaster+30 -4
integration/quibblemaster+109 -0
mediawiki/extensions/Echomaster+6 -18
mediawiki/extensions/Echomaster+16 -33
mediawiki/libs/less.phpmaster+3 -3
mediawiki/coremaster+14 -0
mediawiki/coremaster+22 -25
mediawiki/libs/less.phpmaster+39 -43
mediawiki/coremaster+11 -8
mediawiki/extensions/Kartographermaster+35 -27
mediawiki/extensions/Translatemaster+7 -4
mediawiki/libs/less.phpmaster+11 -6
mediawiki/coremaster+30 -30
mediawiki/coremaster+5 -4
mediawiki/coremaster+23 -22
mediawiki/coremaster+52 -55
integration/quibblemaster+24 -2
mediawiki/coremaster+3 -3
mediawiki/coremaster+2 -28
mediawiki/extensions/MobileFrontendmaster+0 -3
integration/configmaster+14 -0
integration/configmaster+0 -4
mediawiki/coremaster+0 -5
mediawiki/coremaster+56 -16
mediawiki/coremaster+2 -12
mediawiki/coremaster+8 -3
integration/quibblemaster+34 -0
mediawiki/coremaster+2 -12
integration/configmaster+1 -1
integration/configmaster+32 -32
integration/configmaster+28 -28
integration/quibblemaster+27 -3
mediawiki/extensions/ProofreadPagemaster+11 K -621
mediawiki/extensions/GrowthExperimentsmaster+13 K -549
mediawiki/extensions/Echomaster+11 K -216
mediawiki/extensions/AbuseFiltermaster+12 K -469
mediawiki/extensions/FileImportermaster+11 K -530
mediawiki/coremaster+22 -140
mediawiki/coremaster+1 K -679
mediawiki/coremaster+21 -14
mediawiki/coremaster+6 -4
mediawiki/coremaster+1 -1
integration/configmaster+0 -19
integration/configmaster+12 -5
mediawiki/coremaster+20 -19
mediawiki/coremaster+17 -49
mediawiki/coremaster+1 -3
mediawiki/coremaster+13 -1
integration/configmaster+22 -22
integration/configmaster+54 -0
integration/quibblemaster+4 -0
mediawiki/coremaster+3 -10
mediawiki/coremaster+1 -1
mediawiki/extensions/Wikibasemaster+70 -21
mediawiki/coremaster+27 -37
mediawiki/coremaster+29 -5
mediawiki/coremaster+37 -1
mediawiki/extensions/Babelmaster+47 -52
mediawiki/coremaster+16 -1
Show related patches Customize query in gerrit

Related Objects

View Standalone Graph
This task is connected to more than 200 other tasks. Only direct parents and subtasks are shown here. Use View Standalone Graph to show more of the graph.
StatusSubtypeAssignedTask
OpenNone
ResolvedLadsgroup
Resolvedaaron
Resolvedhashar
Resolvedhashar
ResolvedKrinkle
Resolvedhashar
DeclinedNone
ResolvedJdforrester-WMF
Resolvedawight
OpenNone
ResolvedNone
ResolvedDaimona
DeclinedNone
DuplicateNone
OpenNone
Openhashar
OpenNone
OpenNone
StalledNone
OpenNone
OpenNone
OpenNone
OpenNone
ResolvedDaimona
ResolvedDaimona
Resolvedhashar
OpenNone
Openhashar
OpenNone

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

for Selenium tests, that means each web request has to regenerate the localization cache which would explain the slowdown?

When I look at the strange, sometimes extreme runtimes of ResourcesTest and SpecialPageFatalTest I suspect something similar happens in other places. I mean, sure, executing special pages, parsing .less files, or initializing LocalisationCaches is sometimes just very complicated and expensive. But even the most expensive of these examples consumes only a few 100 milliseconds. That's not a problem.

But it is a problem if it's done thousands of times for tests that don't even do anything with the results.

I'm digging into this for a few days now but can't find that single bottleneck anywhere. I suspect the majority of the up to 20 minutes this ticket talks about is just because we are naively running the same processes with the same inputs and the same outputs over and over again, millions and millions of times.

for Selenium tests, that means each web request has to regenerate the localization cache which would explain the slowdown?

When I look at the strange, sometimes extreme runtimes of ResourcesTest and SpecialPageFatalTest I suspect something similar happens in other places. I mean, sure, executing special pages, parsing .less files, or initializing LocalisationCaches is sometimes just very complicated and expensive. But even the most expensive of these examples consumes only a few 100 milliseconds. That's not a problem.

T50217#9030084 is related. I noticed that paratest was hanging, and spending most of the time (12 of 18 seconds) on a single test, LanguageConverterFactoryTest. I looked at that test yesterday, it basically loads the localisation cache for all languages, one by one. And while loading a single language isn't terribly expensive (the slowest I saw was around 80ms), it adds up when done hundreds of times. I couldn't find a solution to that, though, and just stopped trying after a while.

Change 940120 had a related patch set uploaded (by Thiemo Kreuz (WMDE); author: Thiemo Kreuz (WMDE)):

[mediawiki/core@master] Make MapCacheLRU in LanguageFactory static

https://gerrit.wikimedia.org/r/940120

Change 940233 had a related patch set uploaded (by Daimona Eaytoy; author: Daimona Eaytoy):

[mediawiki/core@master] TestLocalisationCache: add static cache of JSON files

https://gerrit.wikimedia.org/r/940233

In order to help understand what's going on under the hood and make more informed decisions on what to improve and what don't. I run tests with excimer and built flamegraphs out of them which you can see for three types of tests (phpunit, selenium, api_tests) here: https://people.wikimedia.org/~ladsgroup/tests_flamegraphs/

The flamegraphs can be searched, zoomed in, etc. (More info: https://www.brendangregg.com/flamegraphs.html)

That provides way better insights and are a treasure trope of areas to improve. I haven't fully dug into all of them but for example: 3% of phpunit tests is being spent on resetting database between each test (MediaWikiIntegrationTestCase::resetDB) which makes some sense but almost all of it is being spent in creating pages and edits(!), why? because it internally calls MediaWikiIntegrationTestCase::addCoreDBData which in turns calls WikiPage::doUserEditContent and makes a page and edits in every integration test(!!!!!)

Or for selenium tests: The biggest bottleneck is minification and RL modules access, I guess it's not properly cached. It can easily brush off a minute from tests.

Feel free to dig into them!

(How it was made? this patch enables logging of excimer and tests create two files, one for wall time and one for cpu time with sampling rate of every .1 second. Then downloaded those logs and run this on the logs: cat ../selenium/cpu-profile.log* | ./flamegraph.pl > ../selenium/cpu-profile.svg; cat ../selenium/wall-profile.log* | ./flamegraph.pl > ../selenium/wall-profile.svg; cat ../selenium/cpu-profile.log* | ./flamegraph.pl --reverse --colors blue > ../selenium/cpu-profile.reverse.svg; cat ../selenium/wall-profile.log* | ./flamegraph.pl --reverse --colors blue > ../selenium/wall-profile.reverse.svg)

You can also take the setting of the patch and enable it locally and run them.

3% of phpunit tests is being spent on resetting database between each test (MediaWikiIntegrationTestCase::resetDB) which makes some sense but almost all of it is being spent in creating pages and edits(!), why? because it internally calls MediaWikiIntegrationTestCase::addCoreDBData which in turns calls WikiPage::doUserEditContent and makes a page and edits in every integration test(!!!!!)

That's something I'd like to get rid of. But first of all, we should avoid cloning and resetting the database for tests not in the 'Database' group (T155147). Once that's done, I'll look into not creating a page and a user for every test.

Zooming out a little bit, PHPUnit (and possibly selenium too, but I haven't tested it) would benefit a lot from parallelization with paratest, see T50217#9022060 and the following comments. I'm a bit skeptical about seeing a x6 speedup in CI as I observed locally; but still, even if it's just a 2x, it's much more than we can do by just improving the code. I'm also planning to look into paratest support once T155147 and T90875 are done.

The one single test for which we should really do something is LanguageConverterFactoryTest. The test instantiates something like 400 language objects, which is really slow. The whole test class takes roughly 13 seconds to run. Running the whole "includes" suite with paratest takes around 19 seconds, and because LanguageConverterFactoryTest is a single test handled by a single process, its 13 seconds mean a lot for the overall runtime. I was looking for ways to speed it up, but I doubt we'll be able to do much. Another option would be to split it into multiple test classes, each one running the same test for a subset of the languages. But that's also something I will look into more closely once we're in a better position to evaluate paratest.

As requested by @Krinkle I increased the sampling rate from every 100m to 1ms. The result are here https://people.wikimedia.org/~ladsgroup/tests_flamegraphs/high_res/

The one single test for which we should really do something is LanguageConverterFactoryTest. The test instantiates something like 400 language objects, which is really slow. The whole test class takes roughly 13 seconds to run. Running the whole "includes" suite with paratest takes around 19 seconds, and because LanguageConverterFactoryTest is a single test handled by a single process, its 13 seconds mean a lot for the overall runtime. I was looking for ways to speed it up, but I doubt we'll be able to do much. Another option would be to split it into multiple test classes, each one running the same test for a subset of the languages. But that's also something I will look into more closely once we're in a better position to evaluate paratest.

13 seconds is not much compared to at 1-2 minutes of every selenium test being spent on cache misses of RL modules: https://people.wikimedia.org/~ladsgroup/tests_flamegraphs/high_res/wall-profile.selenium.svg

Or for selenium tests: The biggest bottleneck is minification and RL modules access, I guess it's not properly cached. It can easily brush off a minute from tests.
[…]

When I download the .log artefacts and load them up into Speedscope, I can confirm that APCUBagOStuff is selected (so php-apcu cache is enabled and discovered by MW), and we see both doGet calls without minification, and minification calls with doSet. Suggesting that functionally it is using the cache correctly.

It's had to tell from the current snapshot whether it is slow or whether it is missing the cache more than once for the same module. The settings here are more or less stock MediaWiki + DevelopmentSettings, similar to mediawiki-docker where this is generally working well.

My first hunch here is that the CI worker is under memory pressure and forced to evict fresh values from the cache before they can be used. For example, the default apc.shm_size in PHP is 32M (https://www.php.net/manual/en/apcu.configuration.php) which is quite small but may just about suffice for human use in MediaWiki-Docker, but in CI we're likely over a threshold where we engage with a lot of different areas. Likewise, opcache (https://www.php.net/manual/en/opcache.configuration.php) has a default of opcache.memory_consumption=128 (MB) and opcache.interned_strings_buffer=8 (MB).

In production (Codesearch) we use:

# appserver
profile::mediawiki::apc_shm_size: 6096M
opcache.interned_strings_buffer: 96
opcache.memory_consumption: 1024

# jobrunner
profile::mediawiki::apc_shm_size: 4096M
opcache.interned_strings_buffer: 96
opcache.memory_consumption: 1024

# default (unused?)
profile::mediawiki::apc_shm_size: 128M
opcache.interned_strings_buffer: 50
opcache.memory_consumption: 300

We don't need that much in CI given that we only interact with 1 wiki (not 1000), and in ~1 language (not 300). It currently has the following set in Quibble (source):

opcache.memory_consumption=256

I presume the others inherit upstream defaults of apc.shm_size=32M and opcache.interned_strings_buffer=8, which would be very small indeed.

I've uploaded the raw log files from @Ladsgroup's high-res exicmer capture (change 940226) to people.wikimedia.org, which has CORS enabled. This means we can load them into Speedscope (aka Excimer UI).

https://people.wikimedia.org/~krinkle/T225730_5min_jenkins-excimer-20230720/

For example (this is a 35MB large file): https://performance.wikimedia.org/excimer/speedscope/#profileURL=https://people.wikimedia.org/~krinkle/T225730_5min_jenkins-excimer-20230720/phpunit_mwquibble_cpuprofile.log.

Screenshot 2023-07-21 at 03.02.01.png (1×2 px, 403 KB)

The sample count reflects time passing in milliseconds. It says the phpunit invocation for mediawiki-quibble-vendor yielded 374,000 samples, or 374 seconds, or 6 minutes, which is right.

Using the "Sandwhich" mode, which is like a more powerful version of classic "reverse" flamegraphs, we can quickly find (for example) the LocalisationCache as having one of the highest "self" times. Clicking it, then reveals which tests it is largely spent by, with most (57%) coming from LanguageConverterFactoryTest.

Screenshot 2023-07-21 at 03.06.49.png (973×2 px, 404 KB)

The one single test for which we should really do something is LanguageConverterFactoryTest. The test instantiates something like 400 language objects, which is really slow. The whole test class takes roughly 13 seconds to run.

I incidentally happened to look into language object instantiation yesterday for another reason: POC: Create Language objects without initializing l10n cache – I’ll probably look into this a bit more. (Edit: now tracked at T342418.)

Change 940502 had a related patch set uploaded (by Thiemo Kreuz (WMDE); author: Thiemo Kreuz (WMDE)):

[mediawiki/core@master] Improve performance of LocalisationCache::mergeMagicWords()

https://gerrit.wikimedia.org/r/940502

Change 939727 merged by jenkins-bot:

[mediawiki/libs/less.php@master] Less_Parser: Inline and optimize heavily called MatchQuoted()

https://gerrit.wikimedia.org/r/939727

Change 940502 merged by jenkins-bot:

[mediawiki/core@master] Improve performance of LocalisationCache::mergeMagicWords()

https://gerrit.wikimedia.org/r/940502

Change 940929 had a related patch set uploaded (by Thiemo Kreuz (WMDE); author: Thiemo Kreuz (WMDE)):

[mediawiki/core@master] Make "pluralRules" caches static in LocalisationCache

https://gerrit.wikimedia.org/r/940929

Change 940929 merged by jenkins-bot:

[mediawiki/core@master] Make "pluralRules" caches static in LocalisationCache

https://gerrit.wikimedia.org/r/940929

If we can just fix the RL's caching in the apache server of selenium. We can easily brush off 1-5 minutes (!) from each selenium run. See my attempt in https://gerrit.wikimedia.org/r/c/mediawiki/core/+/944984

Change 698467 abandoned by Kosta Harlan:

[mediawiki/extensions/Echo@master] selenium: Use a single login for browser tests

Reason:

https://gerrit.wikimedia.org/r/698467

Change 698468 abandoned by Kosta Harlan:

[mediawiki/extensions/Echo@master] selenium: Move notifications page test into general suite

Reason:

https://gerrit.wikimedia.org/r/698468

Change 748314 abandoned by Hashar:

[integration/quibble@master] [DNM] Add excimer config

Reason:

https://gerrit.wikimedia.org/r/748314

Change 774409 abandoned by Kosta Harlan:

[mediawiki/core@master] DevelopmentSettings: Use MWLoggerDefaultSpi for debug logging

Reason:

https://gerrit.wikimedia.org/r/774409

wmf-quibble-core-vendor-mysql-php74-docker/37543/console today shows 18m 47s, which is a bit higher than the upper end of the spectrum noted in May 2021 (13-18m).

Three years later: maybe we should target a more realistic number, like 15 minutes? If we can get to 15 minutes, we could then move on to 10 minutes, and then maybe end up back at this task's current goal of 5 minutes.

I still think this (more realistic scope) might be a useful way to orient our work, though some individual or team would have to own this goal for it to be realistic, and to hold everyone accountable to it.

I'm still convinced that T50217: Speed up MediaWiki PHPUnit build by running integration tests in parallel can give us a lot. Looking at this run of wmf-quibble-core-vendor-mysql-php74-docker, the PHPUnit integration non-database suite took 2m33s, and the database suite 13m41s. In total, that's roughly 16 minutes spent running PHPUnit. Assuming a 5x speedup, consistent with the results of my tests in T50217, this total time would go down to ~3.5 minutes, thus saving ~13 minutes.

I'm still convinced that T50217: Speed up MediaWiki PHPUnit build by running integration tests in parallel can give us a lot. Looking at this run of wmf-quibble-core-vendor-mysql-php74-docker, the PHPUnit integration non-database suite took 2m33s, and the database suite 13m41s. In total, that's roughly 16 minutes spent running PHPUnit. Assuming a 5x speedup, consistent with the results of my tests in T50217, this total time would go down to ~3.5 minutes, thus saving ~13 minutes.

The obsolete, by 5 years, CI infrastructure will not be able to handle that. Last time I checked PHPUnit took a full CPU and MySQL took the other half and running them in parallel would exceed the capacity of the executing VM which runs multiple jobs in parallel competing for the same resource.


There are a few things that can potentially be done though such as:

  • moving Wikibase tests to their own jobs instead of running them from any repository participating in the wmf-quibble jobs T287582
  • related find a way to avoid running every single tests
  • do some profiling to find the low hanging fruit. I used to do that regularly and found some oddities such as a MediaWiki a service uselessly triggering, using a large PHPUnit dataprovider which causes each cases to be hit by PHPUnit test overhead
  • mocking the slow database wherever possible

do some profiling to find the low hanging fruit. I used to do that regularly and found some oddities […]

I also do this regularly, and will happily continue to do so. However. The sad reality is that it doesn't make much of a difference. The slowest individual PHPUnit tests we currently have take about 2 seconds. That's about 0.2% of the total runtime. Even if we could identify and eliminate 200 of such slow tests, the remaining 12,000 tests would still take 10 minutes.

I think the only approach that makes an actual difference is to find more ways to not run all tests all the time. For example, what about a more fine-grained tree of dependencies between codebases that tells us more than the current list of "gated extensions"? For example, while it's critical to run the FileImporter tests whenever something in core's import system is changed, it doesn't make sense to run the Wikibase tests when I make changes in FileImporter. There are a lot of weird combinations like this.

That said, one interesting bottleneck I'm still exploring is the LanguageFactory, see https://gerrit.wikimedia.org/r/940120.

I'm still convinced that T50217: Speed up MediaWiki PHPUnit build by running integration tests in parallel can give us a lot. Looking at this run of wmf-quibble-core-vendor-mysql-php74-docker, the PHPUnit integration non-database suite took 2m33s, and the database suite 13m41s. In total, that's roughly 16 minutes spent running PHPUnit. Assuming a 5x speedup, consistent with the results of my tests in T50217, this total time would go down to ~3.5 minutes, thus saving ~13 minutes.

The obsolete, by 5 years, CI infrastructure will not be able to handle that. Last time I checked PHPUnit took a full CPU and MySQL took the other half and running them in parallel would exceed the capacity of the executing VM which runs multiple jobs in parallel competing for the same resource.

Are there plans to improve the infrastructure or add more resources? As Thiemo said, improving individual tests is, for the most part, not really useful, because there isn't a single source of slowness. Running fewer tests is also an option, but I'm not sure if doing that cleverly is that easy: MW has a complex ecosystem, so it's not just a matter of flipping a switch.

IMHO, the lowest-hanging fruit is parallelization. If we find a way to make that work (both in MW and the CI infrastructure), we should see a huge reduction in CI times, possibly more than can be achieved by running fewer test, and at a lower maintenance cost. Once that's out of the way, perhaps start considering a more fine-grained testing infrastructure to run fewer tests.

At any rate, it's clear that resources are needed to reach this goal. There's no magic trick that will cut the CI runtime by just snapping one's fingers. And I agree with Kosta that this can only happen if a person or team chooses to own this goal, else it'll be really hard to find the necessary resources.

Slow CI can, at times, have a massive effect on productivity. And as "fun" as it might sound, having your patch wait 40 minutes after the +2, only for it to fail a random selenium test and having to go through gate-and-submit again really makes me wanna throw my laptop out of the window sometimes.

Slow CI can, at times, have a massive effect on productivity. And as "fun" as it might sound, having your patch wait 40 minutes after the +2, only for it to fail a random selenium test and having to go through gate-and-submit again really makes me wanna throw my laptop out of the window sometimes.

Somewhat orthogonal to this task, but wanted to note that T323750: Provide early feedback when a patch has job failures can help you know sooner when there's a failure. It has to be enabled per repo, though. (Patch for MW core if anyone wants to look.) It's a band-aid fix, yes, but you at least know of a failure earlier and can trigger a rebase to restart the tests.

I think there are two common use-cases where slow CI is annoying:

  • When submitting a change for code review and not learning quickly that it needs further improvements. As Kosta says, the early feedback bot could help with this.
  • When backporting changes, and having to do a merge (or several merges) in a 60-minute window. IMO we should just make most CI on wmf/ branches non-voting, they provide little value (even when they fail, it rarely represents a real error). T307180: Drop Selenium tests from gate-and-submit-wmf

Agreed that early feedback can help. One more thing to consider though: the bot helps when your patch is the only thing in CI, or at least if there aren't many patches being tested. But CI can get really full sometimes (e.g. if LibUp is running). When that happens, you also need to wait for all the previous jobs to complete before yours even starts, and this can only be improved by making everything faster.

Change #1036311 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[mediawiki/core@master] DNM: Measuring impact of some improvements

https://gerrit.wikimedia.org/r/1036311

Change #1064477 had a related patch set uploaded (by Krinkle; author: Krinkle):

[mediawiki/core@master] qunit: Replace slow mw.messages reset with empty object reset

https://gerrit.wikimedia.org/r/1064477

Change #1064477 merged by jenkins-bot:

[mediawiki/core@master] qunit: Replace slow mw.messages reset with empty object reset

https://gerrit.wikimedia.org/r/1064477

Change #1070884 had a related patch set uploaded (by Krinkle; author: Thiemo Kreuz (WMDE)):

[mediawiki/core@master] phpunit: Set much smaller defaults for RandomImageGenerator

https://gerrit.wikimedia.org/r/1070884

Change #1070884 merged by jenkins-bot:

[mediawiki/core@master] phpunit: Set much smaller defaults for RandomImageGenerator

https://gerrit.wikimedia.org/r/1070884

Change #940120 abandoned by Thiemo Kreuz (WMDE):

[mediawiki/core@master] Make MapCacheLRU in LanguageFactory static

https://gerrit.wikimedia.org/r/940120

I have generated updated profile for the gate jobs, using Amir's patch above and uploaded them on people.wm.org. See T225730#9033209 for how to visualize them, e.g. https://performance.wikimedia.org/excimer/speedscope/#profileURL=https://people.wikimedia.org/~daimona/T225730-profile-2024-12-09/selenium-wmf-cpu-profile.log.

I am particularly interested in the selenium jobs, now that PHPUnit is run in parallel in most places. I haven't analyzed them in depth for now, but at first glance it does look like we're spending a decent amount of time in load.php, and particularly parsing LESS files. I'm not sure if that is expected, but 10% of the total time seems a lot. I'm also somewhat surprised to see APCUBagOStuff::doGet amongst the functions with the larger self times. At first I thought it might just be that we call it a lot (no way of knowing from a sampled profile), but then it shows that there seems to be a single caller responsible for the majority of the spent time, and that is ExtensionRegistry::loadFromQueue. I'm not sure what to make of it, for the time being.

@Daimona Thanks for the analysis - that definitely looks like it's worth investigating further. I'm also focussed on the selenium jobs, and I have to say I think the way to go on that is to just run more agents and shard the jobs - see this investigation (which talks about cypress, but applies to selenium too), and associated jenkins job patch. I feel like that's a fast and straightforward way to get the selenium job time reduced by 50%, even if it doesn't address the root cause of any performance issue. Ultimately, whatever performance improvements we make, they will eventually be offset by us adding more tests - horizontally scaling the test jobs seems to me to be the way to go in the long term.

Thanks for the fresh profiles!

  • I already spend some time in our (forked) Less.php library and managed to improve it quite a bit, most notably via https://gerrit.wikimedia.org/r/939717. I'm not surprised it's still one of the most expensive things we do at startup. I might have another look.

I think it makes sense that parsing LESS would take some time, but I'm surprised by this being 10% of the overall time spent in tests. My main (implicit) question is whether we are parsing the same things over and over again, or if there are otherwise missed caching opportunities. If not, and parsing LESS is just this slow, then I do agree that it would be worth improving its performance.

  • I also noticed that Wikimedia-Minify shows up relatively high. This is not surprising as it's also an actual parser that processes every bit of JavaScript we have one character at a time. While it's heavily optimized already there is certainly still something to improve.

Yup, here too I mostly wanted to confirm that we're making good use of caches.

  • Why is ConvertibleTimestamp::setTimestamp so expensive? This doesn't seem right.

I suppose it might be called a lot of times; we can't know with Excimer. It's not a trivial setter either, so I could see it taking some time if it's called a lot. Worth looking into this though, to see what the number of calls really is, and if anything can be done about it.

I created T381895 to dig deeper into the updated profiles, look for and brainstorm things that we could optimize, and actually optimize them.

@thcipriani and @dduvall: I just noticed y'all missing from the subscriber list here, so you're welcome. :)

[…] I'm also somewhat surprised to see APCUBagOStuff::doGet amongst the functions with the larger self times. […] it shows that there seems to be a single caller responsible for the majority of the spent time, and that is ExtensionRegistry::loadFromQueue. […]

See also:

TLDR: APCU needs to perform eviction and expiration, therefore php-apcu cannot read directly from shared parent process memory, and instead copies recursively. Not too different from unserialize(serialize()). Note that HHVM didn't have this overhead because HHVM simply marked value as expired without physically removing them, which meant on-going requests can safely complete any reads from the array pointer. The downside is that HHVM could OOM crash because... it never physically removed anything. The Performance Team at the time mitigated this in two ways: Reduce the size of ExtReg by optimising it, removing things from it, and by moving some data to lazy attributes. And, by compenstating in other early MW logic in WebStart.php and Setup.php under T189966: Audit and simplify MediaWiki initialisation code (Spring 2018).

Long term options to fully regain this:

  • php-opcache: We can take a lesson from LocalisationCache, Interwiki_map, and GitInfo. We'd let Scap generate the data into a static php array file that we load instead of reading from php-apcu.
  • php-immutable_cache: This is a fork of APCU specifically for storing values that cannot be expired or changed. This requires responsible usage, e.g. only use narrowly for data known to be of preditable size, known to be constant within a given MW deployment version, and with only a single variant of that data per server (i.e. no variants per skin, language, title, wiki, etc). We could add conditional support for this to store a select number of large values for high-frequency low-latency local-server access — without the complexity of generating and configuring alternate access paths via php static array files.

Thank you. You can tell I didn't look into that further, given the number of time that loadFromQueue appears to be mentioned in performance-related tasks ;)

  • php-opcache: We can take a lesson from LocalisationCache, Interwiki_map, and GitInfo. We'd let Scap generate the data into a static php array file that we load instead of reading from php-apcu.

This seems like it might be the natural thing to do, also judging from a few comments in https://github.com/krakjoe/apcu/issues/175. Filed as T382044 if we want to pursue this.

Change #1163834 had a related patch set uploaded (by Hashar; author: Hashar):

[mediawiki/core@master] tests: move slow ScopeStructureTest out of structure

https://gerrit.wikimedia.org/r/1163834

Change #1163834 merged by jenkins-bot:

[mediawiki/core@master] tests: move slow ScopeStructureTest out of structure

https://gerrit.wikimedia.org/r/1163834

Change #1165509 had a related patch set uploaded (by Hashar; author: Hashar):

[mediawiki/core@REL1_39] tests: move slow ScopeStructureTest out of structure

https://gerrit.wikimedia.org/r/1165509

Change #1165513 had a related patch set uploaded (by Hashar; author: Hashar):

[mediawiki/core@REL1_43] tests: move slow ScopeStructureTest out of structure

https://gerrit.wikimedia.org/r/1165513

Change #1165515 had a related patch set uploaded (by Hashar; author: Hashar):

[mediawiki/core@REL1_44] tests: move slow ScopeStructureTest out of structure

https://gerrit.wikimedia.org/r/1165515

Change #1165509 merged by jenkins-bot:

[mediawiki/core@REL1_39] tests: move slow ScopeStructureTest out of structure

https://gerrit.wikimedia.org/r/1165509

Change #1165513 merged by jenkins-bot:

[mediawiki/core@REL1_43] tests: move slow ScopeStructureTest out of structure

https://gerrit.wikimedia.org/r/1165513

Change #1165515 merged by jenkins-bot:

[mediawiki/core@REL1_44] tests: move slow ScopeStructureTest out of structure

https://gerrit.wikimedia.org/r/1165515