Page MenuHomePhabricator

Reduce runtime of MW shared gate Jenkins jobs to 5 min
Open, MediumPublic

Description

Objective

For the "typical" time it takes for a commit to be approved and landed in master to be 5 minutes or less.

Status quo

As of 11 June 2019, the gate usually takes around 20 minutes.

The two slowest jobs typically take 13-17 minutes each. The time for the gate overall is rarely under 15 minutes, because we run multiple of these jobs (increasing the chances of random slowness), and while they can run in parallel, they don't always start immediately - given limited CI execution slots.

Below a sample from a MediaWiki commit (master branch):

Gate pipeline build succeeded.
  • wmf-quibble-core-vendor-mysql-php72-docker SUCCESS in 12m 03s
  • wmf-quibble-core-vendor-mysql-hhvm-docker SUCCESS in 14m 12s
  • mediawiki-quibble-vendor-mysql-php72-docker SUCCESS in 7m 34s
  • mediawiki-quibble-vendor-mysql-php71-docker SUCCESS in 7m 12s
  • mediawiki-quibble-vendor-mysql-php70-docker SUCCESS in 6m 48s
  • mediawiki-quibble-vendor-mysql-hhvm-docker SUCCESS in 8m 32s
  • mediawiki-quibble-vendor-postgres-php72-docker SUCCESS in 10m 05s
  • mediawiki-quibble-vendor-sqlite-php72-docker SUCCESS in 7m 04s
  • mediawiki-quibble-composer-mysql-php70-docker SUCCESS in 8m 14s

(+ jobs that take less than 3 minutes: composer-test, npm-test, and phan.)

These can be grouped in two kinds of jobs:

  • wmf-quibble: These install MW with the gated extensions, and then run all PHPUnit, Selenium and QUnit tests.
  • mediawiki-quibble: These install MW bundled extensions only, and then run PHPUnit, Selenium and QUnit tests.

Stats from wmf-quibble-core-vendor-mysql-php72-docker

  • 9-15 minutes (wmf-gated, extensions-only)
  • Sample:
    • PHPUnit (dbless): 1.91 minutes / 15,782 tests.
    • QUnit: 29 seconds / 1286 tests.
    • Selenium: 143 seconds / 43 tests.
    • PHPUnit (db): 3.85 minutes / 4377 tests.

Stats from mediawiki-quibble-vendor-mysql-php72-docker:

  • 7-10 minutes (plain mediawiki-core)
  • Sample:
    • PHPUnit (unit+dbless): 1.5 minutes / 23,050 tests.
    • QUnit: 4 seconds / 437 tests.
    • PHPUnit (db): 4 minutes / 7604 tests.

Updated status quo

As of 11 May 2021, the gate usually takes around 25 minutes.

The slowest job typically takes 20-25 minutes per run. The time for the gate overall can never be faster than the slowest job, and can be worse as though we run other jobs in parallel, they don't always start immediately, due to given limited CI execution slots.

Below is the time results from a sample MediaWiki commit (master branch):

[Snipped: Jobs faster than 5 minutes]

  • 9m 43s: mediawiki-quibble-vendor-mysql-php74-docker/5873/console
  • 9m 47s: mediawiki-quibble-vendor-mysql-php73-docker/8799/console
  • 10m 03s: mediawiki-quibble-vendor-sqlite-php72-docker/10345/console
  • 10m 13s: mediawiki-quibble-composer-mysql-php72-docker/19129/console
  • 10m 28s: mediawiki-quibble-vendor-mysql-php72-docker/46482/console
  • 13m 11s: mediawiki-quibble-vendor-postgres-php72-docker/10259/console
  • 16m 44s: wmf-quibble-core-vendor-mysql-php72-docker/53990/console
  • 22m 26s: wmf-quibble-selenium-php72-docker/94038/console

Clearly the last two jobs are dominant in the timing:

  • wmf-quibble: This jobs installs MW with the gated extensions, and then runs all PHPUnit and QUnit tests.
  • wmf-quibble-selenium: This job installs MW with the gated extensions, and then runs all the Selenium tests.

Note that the mediawiki-quibble jobs each install just the MW bundled extensions, and then run PHPUnit, Selenium and QUnit tests.

Stats from wmf-quibble-core-vendor-mysql-php72-docker:

  • 13-18 minutes (wmf-gated, extensions-only)
  • Select times:
    • PHPUnit (unit tests): 9 seconds / 13,170 tests.
    • PHPUnit (DB-less integration tests): 3.31 minutes / 21,067 tests.
    • PHPUnit (DB-heavy): 7.91 minutes / 4,257 tests.
    • QUnit: 31 seconds / 1421 tests.

Stats from wmf-quibble-selenium-php72-docker:

  • 20-25 minutes

Scope of task

This task represents the goal of reaching 5 minutes or less. The work tracked here includes researching ways to get there, trying them out, and putting one or more ideas into practice. The task can be closed once we have reached the goal or if we have concluded it isn't feasible or useful.

Feel free to add/remove subtasks as we go along and consider different things.

Stuff done

Ideas to explore and related work

  • Look at the PHPUnit "Test Report" for a commit and sort the root by duration. Find the slowest ones and look at its test suite to look for ways to improve it. Is it repeating expensive setups? Perhaps that can be skipped or re-used. Is it running hundreds of variations for the same integration test? Perhaps reduce it to just 1 case for that story, and apply the remaining cases to a lighter unit test instead.

Details

ProjectBranchLines +/-Subject
mediawiki/coremaster+2 -28
mediawiki/extensions/MobileFrontendmaster+0 -3
mediawiki/coremaster+30 -4
integration/configmaster+14 -0
integration/configmaster+0 -4
mediawiki/coremaster+0 -5
integration/quibblemaster+40 -15
integration/quibblemaster+109 -0
mediawiki/coremaster+56 -16
mediawiki/coremaster+2 -12
mediawiki/coremaster+8 -3
integration/quibblemaster+34 -0
mediawiki/coremaster+2 -12
integration/configmaster+1 -1
integration/configmaster+32 -32
integration/configmaster+28 -28
integration/quibblemaster+27 -3
mediawiki/extensions/ProofreadPagemaster+11 K -621
mediawiki/extensions/GrowthExperimentsmaster+13 K -549
mediawiki/extensions/Echomaster+11 K -216
mediawiki/extensions/AbuseFiltermaster+12 K -469
mediawiki/extensions/FileImportermaster+11 K -530
integration/quibblemaster+24 -2
mediawiki/extensions/Echomaster+16 -33
mediawiki/extensions/Echomaster+6 -18
mediawiki/coremaster+22 -140
mediawiki/coremaster+1 K -679
mediawiki/coremaster+21 -14
mediawiki/coremaster+6 -4
mediawiki/coremaster+1 -1
integration/configmaster+0 -19
integration/configmaster+12 -5
mediawiki/coremaster+20 -19
mediawiki/coremaster+17 -49
mediawiki/coremaster+1 -3
mediawiki/coremaster+13 -1
integration/configmaster+22 -22
integration/configmaster+54 -0
integration/quibblemaster+4 -0
mediawiki/coremaster+3 -10
mediawiki/coremaster+1 -1
mediawiki/extensions/Wikibasemaster+70 -21
mediawiki/coremaster+27 -37
mediawiki/coremaster+29 -5
mediawiki/coremaster+37 -1
mediawiki/extensions/Babelmaster+47 -52
mediawiki/coremaster+16 -1
Show related patches Customize query in gerrit

Related Objects

StatusSubtypeAssignedTask
OpenNone
Resolved Ladsgroup
Resolvedaaron
Resolvedhashar
Resolvedaborrero
Resolvedhashar
Resolvedhashar
Resolvedhashar
Resolved Mholloway
DeclinedReedy
OpenNone
ResolvedKrinkle
ResolvedJdforrester-WMF
ResolvedJdforrester-WMF
DeclinedJdforrester-WMF
ResolvedKrinkle
Resolvedhashar
DeclinedNone
ResolvedJdforrester-WMF
OpenNone
OpenNone
ResolvedNone
ResolvedDaimona
OpenNone
OpenNone
OpenNone
ResolvedNone
DuplicateNone
OpenNone
OpenNone
ResolvedNone
ResolvedNone
Resolvedawight
Resolvedkostajh
OpenNone
Resolvedcscott
Resolvedkostajh
OpenNone
Resolvedkostajh
Resolvedhashar
OpenNone
OpenNone
ResolvedLucas_Werkmeister_WMDE
OpenNone
OpenNone
OpenNone
Resolvedkostajh
ResolvedKrinkle
OpenNone
Resolvedkostajh
Resolvedkostajh
Openkostajh
Resolvedkostajh
OpenNone
Openkostajh
Resolveddaniel
ResolvedBUG REPORTkostajh
StalledNone
ResolvedLucas_Werkmeister_WMDE
OpenNone
OpenNone
ResolvedUmherirrender
OpenNone
ResolvedUmherirrender
OpenNone
OpenNone
OpenNone
OpenNone
OpenNone
OpenNone
OpenNone
OpenNone
OpenNone
OpenNone
OpenNone

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

I understand the general request of making CI faster but I think the tradeoff being asked is out of balance. I believe there's a general consensus that our test coverage as a whole (unit tests, integration tests, browser tests) are insufficient and so there's a general push to add more of them. I would much rather have more test coverage with slower CI than faster CI with less test coverage. Anecdotally, I've seen the selenium tests catch multiple (more than 3, less than 10) potential regressions during the code review process, which to me seems valuable to keep. Really, anything that reduces the amount of regressions we have on a weekly basis seems valuable.

My main gripe with selenium tests is how often they seem flaky, but that's a separate issue...

Change 745991 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[mediawiki/core@master] [DNM] Test if reduction in logging would reduce the tests run time

https://gerrit.wikimedia.org/r/745991

So I removed the debug logging (which I agree is important) to see if logging is taking long time and it seems it does:
Between https://gerrit.wikimedia.org/r/c/mediawiki/core/+/745991 and https://gerrit.wikimedia.org/r/c/mediawiki/core/+/493162/

  • wmf-quibble-core-vendor-mysql-php72-docker went from 17m 22s to 14m 32s a 16% redaction
  • wmf-quibble-selenium-php72-docker went from 21m 00s to 15m 20s a 27% redaction i.e. one quarter of total run time of selenium is just debug logging by mw

Maybe we can have some sort of buffering and flushing for logs? Maybe something is logging a lot of DEBUGs when it's not needed?

Beside that, I generally think we need some sort of o11y into what is happening there, like arclamp or perf monitoring. Something like that would help us see if we messed up something (npm cache, git cache, git shallow, etc. etc.) or we simply need to parallelize.

So I removed the debug logging (which I agree is important) to see if logging is taking long time and it seems it does:
Between https://gerrit.wikimedia.org/r/c/mediawiki/core/+/745991 and https://gerrit.wikimedia.org/r/c/mediawiki/core/+/493162/

  • wmf-quibble-core-vendor-mysql-php72-docker went from 17m 22s to 14m 32s a 16% redaction
  • wmf-quibble-selenium-php72-docker went from 21m 00s to 15m 20s a 27% redaction i.e. one quarter of total run time of selenium is just debug logging by mw

Maybe we can have some sort of buffering and flushing for logs? Maybe something is logging a lot of DEBUGs when it's not needed?

It is extremely rare that I need to look at CI debug logs to see if something is wrong (for most of failures, the stack trace of the failure is enough 99.9% of the time) and even more useless when tests pass. Suggestion: Set it to non-debug by default but add an option similar to check experimental (like "check with debug") when needed.

@Ladsgroup do benchmark on your local machine. The CI stack is unreliable to compare performance between builds / settings for a few reasons:

  • there can be 1 to 4 builds running concurrently
  • disk IO is capped on WMCS instances
  • the underlying hardware can be quite busy do to other instances

Last time I checked on my local machine it did not bear any difference.

My general comment about this task is that all the fine tunings are great but speed up will only be achieved by overhauling the current workflows. There are a few big changes we could make to speed it up:

  • have a pre-flight job that runs the linters for the repository that triggers the patch and hold any other jobs until that one has completed
  • add a build step that would clone the repositories, apply patches and install dependencies. That is currently done by each of the Quibble jobs
  • change lot of MediaWiki tests to be true unit tests
  • first run tests for the affected code path (aka a patch to include/api/* would run tests/phpunit/include/api/* tests first) to speed up the feedback in case of failure
  • stop running every single tests when extensions depend on each others (we now have @group Standalone which got introduced to prevent running Scribunto tests from any other repositories)

Or to say otherwise, we need to revisit how we do integration testing between extensions. Blindly running everything is the easy path but it has the drawback of being rather slow.

@Ladsgroup do benchmark on your local machine. The CI stack is unreliable to compare performance between builds / settings for a few reasons:

  • there can be 1 to 4 builds running concurrently

Both runs were the only run on the whole CI system on a quiet Saturday.

  • disk IO is capped on WMCS instances

Isn't this a reason to reduce disk IO?

  • the underlying hardware can be quite busy do to other instances

True but I ran these in a quiet Saturday. I can try rechecking them several times today evening when it's quiet too but honestly, this has been quite consistent.

Last time I checked on my local machine it did not bear any difference.

Local machine is not useful for production, For example, you probably have SSD while I don't think production has this type of storage. It can be that it moves data to NFS and that's really slow. Generally, I don't think we should compare local and production. The are vastly different, e.g. DB calls in local computer is slow because it reads on disk but in CI, the database is mounted to tmpfs and it's all in memory and db calls are actually quite fast.

Note that with removal of debug, we possibly can increase log retention as it would reduce our log storage drastically (or maybe most of them are mp4 files and this is not useful there)

My general comment about this task is that all the fine tunings are great but speed up will only be achieved by overhauling the current workflows. There are a few big changes we could make to speed it up:

  • have a pre-flight job that runs the linters for the repository that triggers the patch and hold any other jobs until that one has completed

Filed as T297561: Run linters before starting longer running jobs

  • add a build step that would clone the repositories, apply patches and install dependencies. That is currently done by each of the Quibble jobs

+1

  • change lot of MediaWiki tests to be true unit tests

It's a big effort. I think starting T50217: Speed up MediaWiki PHPUnit build by running integration tests in parallel would make sense regardless.

  • first run tests for the affected code path (aka a patch to include/api/* would run tests/phpunit/include/api/* tests first) to speed up the feedback in case of failure

We don't use --stop-on-failure, though, so not sure that would help.

  • stop running every single tests when extensions depend on each others (we now have @group Standalone which got introduced to prevent running Scribunto tests from any other repositories)

Or to say otherwise, we need to revisit how we do integration testing between extensions. Blindly running everything is the easy path but it has the drawback of being rather slow.

I looked at how we can profile to find bottlenecks in it. There is two distinct work we need to do and it will give use a very good catalogue of bottlenecks.

  • Run perf. This one is simple. just ssh into one of the VMs (I tried but for whatever reason it doesn't let me in, can I be added to integration project?) and then run something like this:
sudo perf record -p PID -g -F 99 -- sleep 1000
or
sudo perf record -cgroup DOCKER_CGROUP_ID -g -F 99 -- sleep 1000

and then build a flamegraph for it: (After downloading flamegraph: https://github.com/brendangregg/FlameGraph)

sudo perf script | ./stackcollapse-perf.pl > out.perf-folded
cat out.perf-folded |  ./flamegraph.pl > all.svg
grep -v cpu_idle out.perf-folded | ./flamegraph.pl > nonidle.svg
etc. etc.

Perf is the official performance tooling of linux and it's part of the kernel. This would work to find system level issues, if it spends 50% of time on writing files to NFS, if it's not caching npm and spent most of the time waiting for npmjs.com to respond, etc. etc. but when it comes to what goes inside php everything would be [unknown] which would not be useful. For that we can do the second thing:

  • Install exicmer. It's similar to production. We just need to add php-excimer extension to the docker file and then add a snippet in integration/config's LocalSettings.php to enable excimer and collect data. Then it can send them to beta cluster's arclamp redis (in a dedicated channel obviously) or just log it into a file (or I can set a redis instance somewhere super quickly, it's not rocker science). Then I go around collecting them and building a flamegraph to see what is exactly the slowest parts in the tests. This has been instrumental in finding performance issues in production (I can name at least ten major bugs found by this). I think we can easily utilize this to get more data.

Can anyone please help me to get this off the ground? Specially for access to integration.

Change 748312 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[integration/config@master] dockerfiles: Add php-excimer to quibble

https://gerrit.wikimedia.org/r/748312

Change 748314 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[integration/quibble@master] [DNM] Add excimer config

https://gerrit.wikimedia.org/r/748314

Change 748314 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[integration/quibble@master] [DNM] Add excimer config

https://gerrit.wikimedia.org/r/748314

This would be needed if we want to run this on everything, for now, we can just make a patch to core and run recheck a couple of times.

Mentioned in SAL (#wikimedia-operations) [2021-12-21T15:36:13Z] <Amir1> running sudo perf record -ag -F 99 -- sleep 3600 on integration-agent-docker-1008 and 1009 (T225730)

Change 749266 had a related patch set uploaded (by Umherirrender; author: Umherirrender):

[mediawiki/extensions/AbuseFilter@master] build: Update eslint-config-wikimedia to 0.21.0

https://gerrit.wikimedia.org/r/749266

Change 749267 had a related patch set uploaded (by Umherirrender; author: Umherirrender):

[mediawiki/extensions/Echo@master] build: Update eslint-config-wikimedia to 0.21.0

https://gerrit.wikimedia.org/r/749267

Change 749268 had a related patch set uploaded (by Umherirrender; author: Umherirrender):

[mediawiki/extensions/FileImporter@master] build: Update eslint-config-wikimedia to 0.21.0

https://gerrit.wikimedia.org/r/749268

Change 749271 had a related patch set uploaded (by Umherirrender; author: Umherirrender):

[mediawiki/extensions/GrowthExperiments@master] build: Update eslint-config-wikimedia to 0.21.0

https://gerrit.wikimedia.org/r/749271

Change 749269 had a related patch set uploaded (by Umherirrender; author: Umherirrender):

[mediawiki/extensions/ProofreadPage@master] build: Update eslint-config-wikimedia to 0.21.0

https://gerrit.wikimedia.org/r/749269

I ran this for an hour. The scary part is that it seems 85% of the time is being spent on swapping:
https://people.wikimedia.org/~ladsgroup/ci_flamegraphs/all.svg
(It's interactive, click on things, search, etc.)

I think this is basically T281122: Wikibase selenium tests timeout, seemingly due to "memory compaction" events on CI VMs. 85% of the time is too much :( Maybe in the mean time let's reduce the runner per host to 3 and see how it goes?

While htop didn't bring memory being full during the times I checked which were quiet but given that the uptime for that VM is 294 days I highly recommend at least rebooting this poor VMs (and possibly get it on a more modern OS).

Anyway, the swapping removed, here is the result:
https://people.wikimedia.org/~ladsgroup/ci_flamegraphs/nonswap.svg
TLDR: 35% is php, 5% is npm ci, 2% npm (install?), 11% node, 4.88% java, 4% mysql, 7% git, ffmpeg 3.65%, chromium 3.71%
I think running this for one-hour back to back can cause perf ovehread, let me make it run like a snapshot for a day.

the kallsysms syscalls is fishy there in the swapper. there is a small chance that the perf tool just overwhelmed the system being ran for an hour (but the data is not much 300MB) let me try again with "snapshot mode"

Change 749266 merged by jenkins-bot:

[mediawiki/extensions/AbuseFilter@master] build: Update eslint-config-wikimedia to 0.21.0

https://gerrit.wikimedia.org/r/749266

Change 749268 merged by jenkins-bot:

[mediawiki/extensions/FileImporter@master] build: Update eslint-config-wikimedia to 0.21.0

https://gerrit.wikimedia.org/r/749268

Change 749267 merged by jenkins-bot:

[mediawiki/extensions/Echo@master] build: Update eslint-config-wikimedia to 0.21.0

https://gerrit.wikimedia.org/r/749267

Change 749271 merged by jenkins-bot:

[mediawiki/extensions/GrowthExperiments@master] build: Update eslint-config-wikimedia to 0.21.0

https://gerrit.wikimedia.org/r/749271

Change 749269 abandoned by Umherirrender:

[mediawiki/extensions/ProofreadPage@master] build: Update eslint-config-wikimedia to 0.21.0

Reason:

Update of LockFile version does not work on my setting for this repo due to use of explicit git commit

https://gerrit.wikimedia.org/r/749269

Change 757411 had a related patch set uploaded (by Kosta Harlan; author: Kosta Harlan):

[integration/quibble@master] Run post-dependency install, pre-test steps in parallel

https://gerrit.wikimedia.org/r/757411

Change 757411 merged by jenkins-bot:

[integration/quibble@master] Run post-dependency install, pre-test steps in parallel

https://gerrit.wikimedia.org/r/757411

Change 758952 had a related patch set uploaded (by Kosta Harlan; author: Kosta Harlan):

[integration/quibble@master] release: Quibble 1.4.0

https://gerrit.wikimedia.org/r/758952

Change 758952 merged by jenkins-bot:

[integration/quibble@master] release: Quibble 1.4.0

https://gerrit.wikimedia.org/r/758952

Change 763559 had a related patch set uploaded (by Hashar; author: Hashar):

[integration/config@master] jjb: update Quibble jobs from 1.3.0 to 1.4.0

https://gerrit.wikimedia.org/r/763559

Mentioned in SAL (#wikimedia-releng) [2022-02-21T07:31:03Z] <hashar> Switching Quibble jobs from Quibble 1.3.0 to 1.4.0 # T300340 T291549 T225730

Change 763559 merged by jenkins-bot:

[integration/config@master] jjb: update Quibble jobs from 1.3.0 to 1.4.0

https://gerrit.wikimedia.org/r/763559

Change 767749 had a related patch set uploaded (by Hashar; author: Hashar):

[integration/config@master] jjb: update Quibble jobs from 1.3.0 to 1.4.3

https://gerrit.wikimedia.org/r/767749

Change 767749 merged by jenkins-bot:

[integration/config@master] jjb: update Quibble jobs from 1.3.0 to 1.4.3

https://gerrit.wikimedia.org/r/767749

Change 768068 had a related patch set uploaded (by Krinkle; author: Krinkle):

[integration/config@master] Revert "zuul: Install MobileFrontend when testing Echo"

https://gerrit.wikimedia.org/r/768068

Change 768068 merged by jenkins-bot:

[integration/config@master] Revert "zuul: Install MobileFrontend when testing Echo"

https://gerrit.wikimedia.org/r/768068

Change 771429 had a related patch set uploaded (by Krinkle; author: Aaron Schulz):

[mediawiki/core@master] phpunit: Set $wgSQLMode from DevelopmentSettings instead of MediaWikiIntegrationTestCase

https://gerrit.wikimedia.org/r/771429

Change 771429 merged by jenkins-bot:

[mediawiki/core@master] phpunit: Set $wgSQLMode from DevelopmentSettings instead of MediaWikiIntegrationTestCase

https://gerrit.wikimedia.org/r/771429

Change 771429 merged by jenkins-bot:

[mediawiki/core@master] phpunit: Set $wgSQLMode from DevelopmentSettings instead of MediaWikiIntegrationTestCase

https://gerrit.wikimedia.org/r/771429

FTR, this is currently being reverted as the probable cause of T304625: CI failing with IndexPager::buildQueryInfo error: 'wikidb.unittest_globaluser.gu_id' isn't in GROUP BY. (As I understand it, the intention was to enable the SQL mode for integration tests in a better way, but it was also enabled for browser tests because they also use DevelopmentSettings. And I guess there’s some code that breaks under strict SQL mode, and which is reached by the browser tests but not by the PHPUnit integration tests.)

Change 763333 had a related patch set uploaded (by Kosta Harlan; author: Kosta Harlan):

[integration/quibble@master] Logging: Use MWLoggerDefaultSpi

https://gerrit.wikimedia.org/r/763333

Change 763333 abandoned by Kosta Harlan:

[integration/quibble@master] Logging: Use MWLoggerDefaultSpi

Reason:

moving to core

https://gerrit.wikimedia.org/r/763333

Change 774409 had a related patch set uploaded (by Kosta Harlan; author: Kosta Harlan):

[mediawiki/core@master] DevelopmentSettings: Use MWLoggerDefaultSpi for debug logging

https://gerrit.wikimedia.org/r/774409

Change 771678 had a related patch set uploaded (by Krinkle; author: Aaron Schulz):

[mediawiki/core@master] phpunit: Fix slow testBotPasswordThrottled by lowering limits

https://gerrit.wikimedia.org/r/771678

Change 771678 merged by jenkins-bot:

[mediawiki/core@master] phpunit: Fix slow testBotPasswordThrottled by lowering limits

https://gerrit.wikimedia.org/r/771678

Change 777006 had a related patch set uploaded (by Krinkle; author: Aaron Schulz):

[mediawiki/core@master] phpunit: Set $wgSQLMode from DevelopmentSettings instead of MediaWikiIntegrationTestCase (ii)

https://gerrit.wikimedia.org/r/777006

Change 777006 merged by jenkins-bot:

[mediawiki/core@master] phpunit: Set $wgSQLMode from DevelopmentSettings instead of MediaWikiIntegrationTestCase (ii)

https://gerrit.wikimedia.org/r/777006

Change 777482 had a related patch set uploaded (by Krinkle; author: Krinkle):

[mediawiki/core@master] debug: Fix $wgDebugRawPage to work with PSR-3 debug logging

https://gerrit.wikimedia.org/r/777482

Change 777482 merged by jenkins-bot:

[mediawiki/core@master] debug: Fix $wgDebugRawPage to work with PSR-3 debug logging

https://gerrit.wikimedia.org/r/777482

This is really getting frustrating for the wmf branches. E.g. gerrit 785944 spent an hour in CI, then errored out with Build timed out (after 60 minutes). Marking the build as failed.. gerrit 785941 took 92 minutes to merge. It's basically getting impossible to do an extension backport within the one-hour deploy window; not to mention multiple backports.

Can we just drop Selenium tests from gate-and-submit-wmf? I don't think they add any real value, by the point a patch gets there it typically passed Selenium tests in test, gate-and-submit, test-wmf.

I don't think we should run Selenium on anything other than the master branch tbh.

This is really getting frustrating for the wmf branches. E.g. gerrit 785944 spent an hour in CI, then errored out with Build timed out (after 60 minutes). Marking the build as failed.. gerrit 785941 took 92 minutes to merge. It's basically getting impossible to do an extension backport within the one-hour deploy window; not to mention multiple backports.

Can we just drop Selenium tests from gate-and-submit-wmf? I don't think they add any real value, by the point a patch gets there it typically passed Selenium tests in test, gate-and-submit, test-wmf.

Filed as T307180: Drop Selenium tests from gate-and-submit-wmf, +1 from me.

This is really getting frustrating for the wmf branches. E.g. gerrit 785944 spent an hour in CI, then errored out with Build timed out (after 60 minutes). Marking the build as failed.. gerrit 785941 took 92 minutes to merge. It's basically getting impossible to do an extension backport within the one-hour deploy window; not to mention multiple backports.

If I remember properly, those build froze at npm ci step until Jenkins times out the build and kill it. We can probably lower the build timeout in Jenkins and find a way for Quibble to enforce a timeout to npm ci operation. That would at least make them fails faster.

As for the issue, that seems to match a npmjs issue https://status.npmjs.org/incidents/ljzb0hdg8zr3 :

There was an intermittent issue with package install since 23rd April which is now resolved.
Posted Apr 26, 2022 - 14:30 UTC

In T304114 I was made aware that we're running Minerva Selenium tests on Extension:StopForumSpam which is not even run in production (Beta cluster only as far as I can tell) and never interacts with Minerva.

It also seems we run Minerva browser tests in Vector despite the fact it's impossible for a Vector patch to break Minerva as these are mutually exclusive experiences.

It seems we could save a chunk of time by restricting the extensions where we run our browser tests. The overlap between skins and extensions from the perspective of browser tests is very small.
Is there a way to be more deliberate about where we run which Selenium tests? As far as I can see, Minerva browser tests only need to be run with core, MobileFrontend, VisualEditor and Echo changes.

Change 745991 abandoned by Ladsgroup:

[mediawiki/core@master] [DNM] Test if reduction in logging would reduce the tests run time

Reason:

test done

https://gerrit.wikimedia.org/r/745991

Change 822663 had a related patch set uploaded (by Krinkle; author: Krinkle):

[integration/config@master] jjb: Remove 'compress junit' postbuildscript step

https://gerrit.wikimedia.org/r/822663

Change 822663 merged by jenkins-bot:

[integration/config@master] jjb: [quibble] Remove 'compress junit' postbuildscript step

https://gerrit.wikimedia.org/r/822663

Change 748312 merged by jenkins-bot:

[integration/config@master] dockerfiles: Add php-excimer to quibble

https://gerrit.wikimedia.org/r/748312

Change 853292 had a related patch set uploaded (by Kosta Harlan; author: Kosta Harlan):

[mediawiki/extensions/MobileFrontend@master] Remove selenium entries from package.json

https://gerrit.wikimedia.org/r/853292

Three years later: maybe we should target a more realistic number, like 15 minutes? If we can get to 15 minutes, we could then move on to 10 minutes, and then maybe end up back at this task's current goal of 5 minutes.

T287582: Move some Wikibase selenium tests to a standalone job is a potential low hanging fruit. The Wikibase Selenium tests takes roughly 5minutes30 and only depends on MinervaNeue, MobileFrontend and UniversalLanguageSelector (and MediaWiki core obviously). If we moved those to a standalone job only triggering for those few repositories, that will save 5minutes30 for all other repositories. We would have to drop the npm selenium-test entry point from Wikibase to prevent it from being discovered when Wikibase is a dependency.

Another potentially large saver would be to mark slow tests with @group Standalone which would only trigger when a patchset target that repo and thus prevent them from running when the patchset is for another repository.

@hashar Afaik for PHPUnit we only install (and thus test) dependency extensions, based on the dependency map in CI config. Is this not the case for the Selenium job? If there's a default list applied also, perhaps we can opt-out from that for the Selenium job.

Change 853292 merged by jenkins-bot:

[mediawiki/extensions/MobileFrontend@master] Remove selenium entries from package.json

https://gerrit.wikimedia.org/r/853292

Change 859146 had a related patch set uploaded (by Krinkle; author: Tim Starling):

[mediawiki/core@master] Reduce time cost of password tests

https://gerrit.wikimedia.org/r/859146

Change 859146 merged by jenkins-bot:

[mediawiki/core@master] password: Reduce time cost of password unit tests

https://gerrit.wikimedia.org/r/859146

@hashar Afaik for PHPUnit we only install (and thus test) dependency extensions, based on the dependency map in CI config. Is this not the case for the Selenium job? If there's a default list applied also, perhaps we can opt-out from that for the Selenium job.

The PHPUnit and Selenium kind jobs are alike: they both use Quibble as test runner and both have extensions/skins dependencies injected from CI. The only difference is the kind of tests being run which are filtered via quibble parameters, respectively: --skip=selenium and --run=selenium.

The registration or discovery of tests varies though. A detailed description (you probably know most of that already but that can be helpful for other readers):

PHPUnit

We run the MediaWiki core extensions suite (defined at tests/phpunit/suites/ExtensionsTestSuite.php) which:

  • discover tests if they are put in a /tests/phpunit directory relatively to each extension/skin.
  • add tests registered via the UnitTestsList hook

Those tests are depending on the centrally managed configuration from mediawiki/core eg the phpunit version.

Selenium

There is no registration mechanism, only a discovery phase. Each extension/skin test suite is standalone and can use different versions of Webdriver.io. The convention is developers define how to run the browser tests by adding a selenium-test npm script. In those jobs quibble --run=selenium has all extensions/skins cloned in and checked out, Quibble then crawls through each repo looking for a package.json and if has a selenium-test it will run those tests invoking npm run selenium. They are run serially.


The alternative would be to only run the Selenium test for the triggering repository. Something such as: quibble --command 'cd $THING_NAME && npm run-script selenium-test. But we will loose the integration testing between repository :-(

For PHPUnit test we have moved some tests to @group Standalone. The use case was to avoid running the Scribunto integration tests for any extension depending on it, given those tests are only going to be affected by a change to Scribunto there was no point in running them when a patch triggers Wikibase for example. It is another build stage defined in Quibble: `quibble --run phpunit-standalone.

I guess we could do something similar for Selenium and split tests between:

  • integration testing (to be run by any repo)
  • standalone tests (to be run solely when a patch triggers for this repo)
  • add a selenium-standalone npm script convention which in the repo would invoke something such as wdio --spec tests/selenium/specs/standalone/**/*.js

The Wikibase Selenium tests […] only depends on MinervaNeue, MobileFrontend and UniversalLanguageSelector […]
! In T225730#8370890, @Krinkle wrote:

@hashar Afaik for PHPUnit we only install (and thus test) dependency extensions, based on the dependency map in CI config. Is this not the case for the Selenium job? […]

The PHPUnit and Selenium kind jobs are alike: they both use Quibble as test runner and both have extensions/skins dependencies injected from CI. […] The registration or discovery of tests varies though. […] Selenium: There is no registration mechanism, only a discovery phase. […]

My reason for asking is that I thought you meant that Wikibase CI is slow because it is running selenium tests for extensions that it does not depend on. The lack of a test registration system for Selenium should not be an issue since we can only discover what we install in CI, and CI only installs what Wikibase depends on according to the repo dependency map.

For PHPUnit test we have moved some tests to @group Standalone. […] I guess we could do something similar for Selenium and split tests, [and] add a selenium-standalone npm script convention.

Another option might be to filter out @standalone tests via wdio --mochaOpts.grep --invert. This is even more similar to PHPUnit and is what we already use for running the @daily selenium tests. This has the benefit of keeping a simple and consistent way to run selenium tests on any given repo as a contributor without needing to know about this detail. It is then to Quibble to invoke a variant npm run selenium-nostandalone on non-current repos to skip those. This instead of e.g. having the main selenium entrypoint no longe run all the tests, which might lead to confusion.