Page MenuHomePhabricator

Reduce runtime of MW shared gate Jenkins jobs to 5 min
Open, MediumPublic

Description

Objective

For the "typical" time it takes for a commit to be approved and landed in master to be 5 minutes or less.

Status quo

As of 11 June 2019, the gate usually takes around 20 minutes.

The two slowest jobs typically take 13-17 minutes each. The time for the gate overall is rarely under 15 minutes, because we run multiple of these jobs (increasing the chances of random slowness), and while they can run in parallel, they don't always start immediately - given limited CI execution slots.

Below a sample from a MediaWiki commit (master branch):

Gate pipeline build succeeded.
  • wmf-quibble-core-vendor-mysql-php72-docker SUCCESS in 12m 03s
  • wmf-quibble-core-vendor-mysql-hhvm-docker SUCCESS in 14m 12s
  • mediawiki-quibble-vendor-mysql-php72-docker SUCCESS in 7m 34s
  • mediawiki-quibble-vendor-mysql-php71-docker SUCCESS in 7m 12s
  • mediawiki-quibble-vendor-mysql-php70-docker SUCCESS in 6m 48s
  • mediawiki-quibble-vendor-mysql-hhvm-docker SUCCESS in 8m 32s
  • mediawiki-quibble-vendor-postgres-php72-docker SUCCESS in 10m 05s
  • mediawiki-quibble-vendor-sqlite-php72-docker SUCCESS in 7m 04s
  • mediawiki-quibble-composer-mysql-php70-docker SUCCESS in 8m 14s

(+ jobs that take less than 3 minutes: composer-test, npm-test, and phan.)

These can be grouped in two kinds of jobs:

  • wmf-quibble: These install MW with the gated extensions, and then run all PHPUnit, Selenium and QUnit tests.
  • mediawiki-quibble: These install MW bundled extensions only, and then run PHPUnit, Selenium and QUnit tests.

Stats from wmf-quibble-core-vendor-mysql-php72-docker

  • 9-15 minutes (wmf-gated, extensions-only)
  • Sample:
    • PHPUnit (dbless): 1.91 minutes / 15,782 tests.
    • QUnit: 29 seconds / 1286 tests.
    • Selenium: 143 seconds / 43 tests.
    • PHPUnit (db): 3.85 minutes / 4377 tests.

Stats from mediawiki-quibble-vendor-mysql-php72-docker:

  • 7-10 minutes (plain mediawiki-core)
  • Sample:
    • PHPUnit (unit+dbless): 1.5 minutes / 23,050 tests.
    • QUnit: 4 seconds / 437 tests.
    • PHPUnit (db): 4 minutes / 7604 tests.

Updated status quo

As of 11 May 2021, the gate usually takes around 25 minutes.

The slowest job typically takes 20-25 minutes per run. The time for the gate overall can never be faster than the slowest job, and can be worse as though we run other jobs in parallel, they don't always start immediately, due to given limited CI execution slots.

Below is the time results from a sample MediaWiki commit (master branch):

[Snipped: Jobs faster than 5 minutes]

  • 9m 43s: mediawiki-quibble-vendor-mysql-php74-docker/5873/console
  • 9m 47s: mediawiki-quibble-vendor-mysql-php73-docker/8799/console
  • 10m 03s: mediawiki-quibble-vendor-sqlite-php72-docker/10345/console
  • 10m 13s: mediawiki-quibble-composer-mysql-php72-docker/19129/console
  • 10m 28s: mediawiki-quibble-vendor-mysql-php72-docker/46482/console
  • 13m 11s: mediawiki-quibble-vendor-postgres-php72-docker/10259/console
  • 16m 44s: wmf-quibble-core-vendor-mysql-php72-docker/53990/console
  • 22m 26s: wmf-quibble-selenium-php72-docker/94038/console

Clearly the last two jobs are dominant in the timing:

  • wmf-quibble: This jobs installs MW with the gated extensions, and then runs all PHPUnit and QUnit tests.
  • wmf-quibble-selenium: This job installs MW with the gated extensions, and then runs all the Selenium tests.

Note that the mediawiki-quibble jobs each install just the MW bundled extensions, and then run PHPUnit, Selenium and QUnit tests.

Stats from wmf-quibble-core-vendor-mysql-php72-docker:

  • 13-18 minutes (wmf-gated, extensions-only)
  • Select times:
    • PHPUnit (unit tests): 9 seconds / 13,170 tests.
    • PHPUnit (DB-less integration tests): 3.31 minutes / 21,067 tests.
    • PHPUnit (DB-heavy): 7.91 minutes / 4,257 tests.
    • QUnit: 31 seconds / 1421 tests.

Stats from wmf-quibble-selenium-php72-docker:

  • 20-25 minutes

Scope of task

This task represents the goal of reaching 5 minutes or less. The work tracked here includes researching ways to get there, trying them out, and putting one or more ideas into practice. The task can be closed once we have reached the goal or if we have concluded it isn't feasible or useful.

Feel free to add/remove subtasks as we go along and consider different things.

Stuff done

Ideas to explore and related work

  • Look at the PHPUnit "Test Report" for a commit and sort the root by duration. Find the slowest ones and look at its test suite to look for ways to improve it. Is it repeating expensive setups? Perhaps that can be skipped or re-used. Is it running hundreds of variations for the same integration test? Perhaps reduce it to just 1 case for that story, and apply the remaining cases to a lighter unit test instead.

Details

ProjectBranchLines +/-Subject
mediawiki/coremaster+30 -4
integration/quibblemaster+40 -15
mediawiki/coremaster+0 -5
integration/quibblemaster+109 -0
mediawiki/coremaster+56 -16
mediawiki/coremaster+2 -12
mediawiki/coremaster+8 -3
integration/quibblemaster+34 -0
mediawiki/coremaster+2 -12
integration/configmaster+1 -1
integration/configmaster+2 -0
integration/configmaster+32 -32
integration/configmaster+28 -28
integration/quibblemaster+27 -3
mediawiki/extensions/ProofreadPagemaster+11 K -621
mediawiki/extensions/GrowthExperimentsmaster+13 K -549
mediawiki/extensions/Echomaster+11 K -216
mediawiki/extensions/AbuseFiltermaster+12 K -469
mediawiki/extensions/FileImportermaster+11 K -530
integration/quibblemaster+24 -2
mediawiki/extensions/Echomaster+16 -33
mediawiki/extensions/Echomaster+6 -18
mediawiki/coremaster+22 -140
mediawiki/coremaster+1 K -679
mediawiki/coremaster+21 -14
mediawiki/coremaster+6 -4
mediawiki/coremaster+1 -1
integration/configmaster+0 -19
integration/configmaster+12 -5
mediawiki/coremaster+20 -19
mediawiki/coremaster+17 -49
mediawiki/coremaster+1 -3
mediawiki/coremaster+13 -1
integration/configmaster+22 -22
integration/configmaster+54 -0
integration/quibblemaster+4 -0
mediawiki/coremaster+3 -10
mediawiki/coremaster+1 -1
mediawiki/extensions/Wikibasemaster+70 -21
mediawiki/coremaster+27 -37
mediawiki/coremaster+29 -5
mediawiki/coremaster+37 -1
mediawiki/extensions/Babelmaster+47 -52
mediawiki/coremaster+16 -1
Show related patches Customize query in gerrit

Related Objects

StatusSubtypeAssignedTask
OpenNone
ResolvedLadsgroup
Resolvedaaron
Resolvedhashar
Resolvedaborrero
Resolvedhashar
Resolvedhashar
Resolvedhashar
Resolved Mholloway
DeclinedReedy
OpenNone
ResolvedKrinkle
ResolvedJdforrester-WMF
ResolvedJdforrester-WMF
DeclinedJdforrester-WMF
ResolvedKrinkle
Resolvedhashar
DeclinedNone
ResolvedJdforrester-WMF
OpenNone
OpenNone
ResolvedNone
ResolvedDaimona
OpenLegoktm
OpenNone
OpenNone
ResolvedNone
DuplicateNone
OpenNone
OpenNone
ResolvedNone
ResolvedNone
Resolvedawight
Resolvedkostajh
OpenNone
Resolvedcscott
Resolvedkostajh
OpenNone
Resolvedkostajh
Resolvedhashar
OpenNone
OpenNone
ResolvedLucas_Werkmeister_WMDE
OpenNone
OpenNone
OpenNone
Resolvedkostajh
ResolvedKrinkle
OpenNone
Resolvedkostajh
Resolvedkostajh
OpenNone
OpenNone
OpenNone
OpenNone
OpenNone

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

@kostajh Is the time spent mainly in API login or UI login, or are they both equally bad? Is there a way to speed that up perhaps? We toyed around last year with the idea that it might be due to overly strong password hashing which could be slow in VMs. Afiak we ruled that out but maybe it's still an issue?

I haven't tried to measure that specifically; maybe the login flow is a red herring, and it's more specifically to do with many things being slower as a logged-in user due to additional code paths that are executed serially with PHP's built-in server.

Maybe there are deferred updates that require more processing time for authenticated users, and with PHP's built-in server these aren't really deferred (since we have a single thread to process requests) and are instead handled sequentially. That's something that would go away once we make some progress on T276428: Introduce non-voting jobs with quibble+apache

Change 698947 had a related patch set uploaded (by Kosta Harlan; author: Kosta Harlan):

[mediawiki/core@master] DeferredUpdates: Log execution time for updates

https://gerrit.wikimedia.org/r/698947

@kostajh Is the time spent mainly in API login or UI login, or are they both equally bad? Is there a way to speed that up perhaps? We toyed around last year with the idea that it might be due to overly strong password hashing which could be slow in VMs. Afiak we ruled that out but maybe it's still an issue?

I haven't tried to measure that specifically; maybe the login flow is a red herring, and it's more specifically to do with many things being slower as a logged-in user due to additional code paths that are executed serially with PHP's built-in server.

Maybe there are deferred updates that require more processing time for authenticated users, and with PHP's built-in server these aren't really deferred (since we have a single thread to process requests) and are instead handled sequentially. That's something that would go away once we make some progress on T276428: Introduce non-voting jobs with quibble+apache

Hmm, no, it doesn't seem like deferred updates are causing any substantial delays in the single-threaded execution context. https://gerrit.wikimedia.org/r/698947 added some logging for execution time for the deferred updates, and for e.g. the Selenium job (https://integration.wikimedia.org/ci/job/wmf-quibble-selenium-php72-docker/98521/consoleFull; logs: https://integration.wikimedia.org/ci/job/wmf-quibble-selenium-php72-docker/98521/artifact/log/mw-debug-www.log.gz), it appears that the processing time across the entire log file is 10.44 seconds.

Looking at https://integration.wikimedia.org/ci/job/wmf-quibble-apache-selenium-php72-docker/1990/console, one issue is that it's fully 8 minutes before the Selenium tests even get started, mostly due to lengthy npm install times.

Change 702909 had a related patch set uploaded (by Kosta Harlan; author: Kosta Harlan):

[integration/quibble@master] npm: Use cache flag for project directory and prefer offline

https://gerrit.wikimedia.org/r/702909

Change 736465 had a related patch set uploaded (by Kosta Harlan; author: Kosta Harlan):

[mediawiki/skins/Vector@master] Remove selenium-test from package.json

https://gerrit.wikimedia.org/r/736465

If we wanted to reduce the time of this job personally I'd suggest revisiting how we run Selenium.

We currently run Selenium over everywhere the tests exist. For example the Minerva skin runs Wikibase integration tests. Seldom has this caught important regressions. We have daily jobs that we could lean on for this purpose instead at much less cost. We could even switch daily jobs to twice daily if we needed.

If we wanted to reduce the time of this job personally I'd suggest revisiting how we run Selenium.

We currently run Selenium over everywhere the tests exist. For example the Minerva skin runs Wikibase integration tests. Seldom has this caught important regressions. We have daily jobs that we could lean on for this purpose instead at much less cost. We could even switch daily jobs to twice daily if we needed.

I'm not opposed to that, but I'm wondering about who gets informed when these fail? Does anyone look at these regularly? I see TwoColConflict and WikibaseLexeme both have failures https://integration.wikimedia.org/ci/view/selenium-daily/ Relying on the daily jobs seems conceptually similar to T225248: Consider moving browser based tests (Selenium and QUnit) to a non-voting pipeline which was declined. I would prefer T225248: Consider moving browser based tests (Selenium and QUnit) to a non-voting pipeline to relying only on the daily jobs as you'd be able to get the feedback on a per-patch basis, you just wouldn't have to wait around for it.

If we move selenium to daily jobs, I doubt anyone would care about them anymore. Even if they do, it would take a while. Here is an example: T277862 (tbf, this is breaking in beta and the problem is on beta's side).

If we move selenium to daily jobs, I doubt anyone would care about them anymore. Even if they do, it would take a while. Here is an example: T277862 (tbf, this is breaking in beta and the problem is on beta's side).

Different teams can respond to Selenium tests failures in different ways, and I think that's OK. If the Selenium tests are moved to a separate, non-voting pipeline that comments on the task with success/failure when it's done (allowing all the rest of the tests to report back much earlier), then different teams could choose how they want to triage responding to those errors. And we could have scripts to help, for example by filing an "Unbreak now" task for the relevant project, which would give the team a heads up that this is something they need to address.

Perhaps in addition to the non-voting pipeline, the Selenium tests could be run (in a voting way) alongside other tests when a branch cut happens. That might result in some delays to the train if a team has been ignoring the non-voting Selenium job failures, but that's something on the relevant team to improve with their practice.

That sounds good to me as long as have policies to make sure selenium tests stay relevant, I don't mind.

I understand the general request of making CI faster but I think the tradeoff being asked is out of balance. I believe there's a general consensus that our test coverage as a whole (unit tests, integration tests, browser tests) are insufficient and so there's a general push to add more of them. I would much rather have more test coverage with slower CI than faster CI with less test coverage. Anecdotally, I've seen the selenium tests catch multiple (more than 3, less than 10) potential regressions during the code review process, which to me seems valuable to keep. Really, anything that reduces the amount of regressions we have on a weekly basis seems valuable.

My main gripe with selenium tests is how often they seem flaky, but that's a separate issue...

Change 745991 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[mediawiki/core@master] [DNM] Test if reduction in logging would reduce the tests run time

https://gerrit.wikimedia.org/r/745991

So I removed the debug logging (which I agree is important) to see if logging is taking long time and it seems it does:
Between https://gerrit.wikimedia.org/r/c/mediawiki/core/+/745991 and https://gerrit.wikimedia.org/r/c/mediawiki/core/+/493162/

  • wmf-quibble-core-vendor-mysql-php72-docker went from 17m 22s to 14m 32s a 16% redaction
  • wmf-quibble-selenium-php72-docker went from 21m 00s to 15m 20s a 27% redaction i.e. one quarter of total run time of selenium is just debug logging by mw

Maybe we can have some sort of buffering and flushing for logs? Maybe something is logging a lot of DEBUGs when it's not needed?

Beside that, I generally think we need some sort of o11y into what is happening there, like arclamp or perf monitoring. Something like that would help us see if we messed up something (npm cache, git cache, git shallow, etc. etc.) or we simply need to parallelize.

So I removed the debug logging (which I agree is important) to see if logging is taking long time and it seems it does:
Between https://gerrit.wikimedia.org/r/c/mediawiki/core/+/745991 and https://gerrit.wikimedia.org/r/c/mediawiki/core/+/493162/

  • wmf-quibble-core-vendor-mysql-php72-docker went from 17m 22s to 14m 32s a 16% redaction
  • wmf-quibble-selenium-php72-docker went from 21m 00s to 15m 20s a 27% redaction i.e. one quarter of total run time of selenium is just debug logging by mw

Maybe we can have some sort of buffering and flushing for logs? Maybe something is logging a lot of DEBUGs when it's not needed?

It is extremely rare that I need to look at CI debug logs to see if something is wrong (for most of failures, the stack trace of the failure is enough 99.9% of the time) and even more useless when tests pass. Suggestion: Set it to non-debug by default but add an option similar to check experimental (like "check with debug") when needed.

@Ladsgroup do benchmark on your local machine. The CI stack is unreliable to compare performance between builds / settings for a few reasons:

  • there can be 1 to 4 builds running concurrently
  • disk IO is capped on WMCS instances
  • the underlying hardware can be quite busy do to other instances

Last time I checked on my local machine it did not bear any difference.

My general comment about this task is that all the fine tunings are great but speed up will only be achieved by overhauling the current workflows. There are a few big changes we could make to speed it up:

  • have a pre-flight job that runs the linters for the repository that triggers the patch and hold any other jobs until that one has completed
  • add a build step that would clone the repositories, apply patches and install dependencies. That is currently done by each of the Quibble jobs
  • change lot of MediaWiki tests to be true unit tests
  • first run tests for the affected code path (aka a patch to include/api/* would run tests/phpunit/include/api/* tests first) to speed up the feedback in case of failure
  • stop running every single tests when extensions depend on each others (we now have @group Standalone which got introduced to prevent running Scribunto tests from any other repositories)

Or to say otherwise, we need to revisit how we do integration testing between extensions. Blindly running everything is the easy path but it has the drawback of being rather slow.

@Ladsgroup do benchmark on your local machine. The CI stack is unreliable to compare performance between builds / settings for a few reasons:

  • there can be 1 to 4 builds running concurrently

Both runs were the only run on the whole CI system on a quiet Saturday.

  • disk IO is capped on WMCS instances

Isn't this a reason to reduce disk IO?

  • the underlying hardware can be quite busy do to other instances

True but I ran these in a quiet Saturday. I can try rechecking them several times today evening when it's quiet too but honestly, this has been quite consistent.

Last time I checked on my local machine it did not bear any difference.

Local machine is not useful for production, For example, you probably have SSD while I don't think production has this type of storage. It can be that it moves data to NFS and that's really slow. Generally, I don't think we should compare local and production. The are vastly different, e.g. DB calls in local computer is slow because it reads on disk but in CI, the database is mounted to tmpfs and it's all in memory and db calls are actually quite fast.

Note that with removal of debug, we possibly can increase log retention as it would reduce our log storage drastically (or maybe most of them are mp4 files and this is not useful there)

My general comment about this task is that all the fine tunings are great but speed up will only be achieved by overhauling the current workflows. There are a few big changes we could make to speed it up:

  • have a pre-flight job that runs the linters for the repository that triggers the patch and hold any other jobs until that one has completed

Filed as T297561: Run linters before starting longer running jobs

  • add a build step that would clone the repositories, apply patches and install dependencies. That is currently done by each of the Quibble jobs

+1

  • change lot of MediaWiki tests to be true unit tests

It's a big effort. I think starting T50217: Speed up MediaWiki PHPUnit build by running integration tests in parallel would make sense regardless.

  • first run tests for the affected code path (aka a patch to include/api/* would run tests/phpunit/include/api/* tests first) to speed up the feedback in case of failure

We don't use --stop-on-failure, though, so not sure that would help.

  • stop running every single tests when extensions depend on each others (we now have @group Standalone which got introduced to prevent running Scribunto tests from any other repositories)

Or to say otherwise, we need to revisit how we do integration testing between extensions. Blindly running everything is the easy path but it has the drawback of being rather slow.

I looked at how we can profile to find bottlenecks in it. There is two distinct work we need to do and it will give use a very good catalogue of bottlenecks.

  • Run perf. This one is simple. just ssh into one of the VMs (I tried but for whatever reason it doesn't let me in, can I be added to integration project?) and then run something like this:
sudo perf record -p PID -g -F 99 -- sleep 1000
or
sudo perf record -cgroup DOCKER_CGROUP_ID -g -F 99 -- sleep 1000

and then build a flamegraph for it: (After downloading flamegraph: https://github.com/brendangregg/FlameGraph)

sudo perf script | ./stackcollapse-perf.pl > out.perf-folded
cat out.perf-folded |  ./flamegraph.pl > all.svg
grep -v cpu_idle out.perf-folded | ./flamegraph.pl > nonidle.svg
etc. etc.

Perf is the official performance tooling of linux and it's part of the kernel. This would work to find system level issues, if it spends 50% of time on writing files to NFS, if it's not caching npm and spent most of the time waiting for npmjs.com to respond, etc. etc. but when it comes to what goes inside php everything would be [unknown] which would not be useful. For that we can do the second thing:

  • Install exicmer. It's similar to production. We just need to add php-excimer extension to the docker file and then add a snippet in integration/config's LocalSettings.php to enable excimer and collect data. Then it can send them to beta cluster's arclamp redis (in a dedicated channel obviously) or just log it into a file (or I can set a redis instance somewhere super quickly, it's not rocker science). Then I go around collecting them and building a flamegraph to see what is exactly the slowest parts in the tests. This has been instrumental in finding performance issues in production (I can name at least ten major bugs found by this). I think we can easily utilize this to get more data.

Can anyone please help me to get this off the ground? Specially for access to integration.

Change 748312 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[integration/config@master] dockerfiles: Add php-excimer to quibble

https://gerrit.wikimedia.org/r/748312

Change 748314 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[integration/quibble@master] [DNM] Add excimer config

https://gerrit.wikimedia.org/r/748314

Change 748314 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[integration/quibble@master] [DNM] Add excimer config

https://gerrit.wikimedia.org/r/748314

This would be needed if we want to run this on everything, for now, we can just make a patch to core and run recheck a couple of times.

Mentioned in SAL (#wikimedia-operations) [2021-12-21T15:36:13Z] <Amir1> running sudo perf record -ag -F 99 -- sleep 3600 on integration-agent-docker-1008 and 1009 (T225730)

Change 749266 had a related patch set uploaded (by Umherirrender; author: Umherirrender):

[mediawiki/extensions/AbuseFilter@master] build: Update eslint-config-wikimedia to 0.21.0

https://gerrit.wikimedia.org/r/749266

Change 749267 had a related patch set uploaded (by Umherirrender; author: Umherirrender):

[mediawiki/extensions/Echo@master] build: Update eslint-config-wikimedia to 0.21.0

https://gerrit.wikimedia.org/r/749267

Change 749268 had a related patch set uploaded (by Umherirrender; author: Umherirrender):

[mediawiki/extensions/FileImporter@master] build: Update eslint-config-wikimedia to 0.21.0

https://gerrit.wikimedia.org/r/749268

Change 749271 had a related patch set uploaded (by Umherirrender; author: Umherirrender):

[mediawiki/extensions/GrowthExperiments@master] build: Update eslint-config-wikimedia to 0.21.0

https://gerrit.wikimedia.org/r/749271

Change 749269 had a related patch set uploaded (by Umherirrender; author: Umherirrender):

[mediawiki/extensions/ProofreadPage@master] build: Update eslint-config-wikimedia to 0.21.0

https://gerrit.wikimedia.org/r/749269

I ran this for an hour. The scary part is that it seems 85% of the time is being spent on swapping:
https://people.wikimedia.org/~ladsgroup/ci_flamegraphs/all.svg
(It's interactive, click on things, search, etc.)

I think this is basically T281122: Wikibase selenium tests timeout, seemingly due to "memory compaction" events on CI VMs. 85% of the time is too much :( Maybe in the mean time let's reduce the runner per host to 3 and see how it goes?

While htop didn't bring memory being full during the times I checked which were quiet but given that the uptime for that VM is 294 days I highly recommend at least rebooting this poor VMs (and possibly get it on a more modern OS).

Anyway, the swapping removed, here is the result:
https://people.wikimedia.org/~ladsgroup/ci_flamegraphs/nonswap.svg
TLDR: 35% is php, 5% is npm ci, 2% npm (install?), 11% node, 4.88% java, 4% mysql, 7% git, ffmpeg 3.65%, chromium 3.71%
I think running this for one-hour back to back can cause perf ovehread, let me make it run like a snapshot for a day.

the kallsysms syscalls is fishy there in the swapper. there is a small chance that the perf tool just overwhelmed the system being ran for an hour (but the data is not much 300MB) let me try again with "snapshot mode"

Change 749266 merged by jenkins-bot:

[mediawiki/extensions/AbuseFilter@master] build: Update eslint-config-wikimedia to 0.21.0

https://gerrit.wikimedia.org/r/749266

Change 749268 merged by jenkins-bot:

[mediawiki/extensions/FileImporter@master] build: Update eslint-config-wikimedia to 0.21.0

https://gerrit.wikimedia.org/r/749268

Change 749267 merged by jenkins-bot:

[mediawiki/extensions/Echo@master] build: Update eslint-config-wikimedia to 0.21.0

https://gerrit.wikimedia.org/r/749267

Change 749271 merged by jenkins-bot:

[mediawiki/extensions/GrowthExperiments@master] build: Update eslint-config-wikimedia to 0.21.0

https://gerrit.wikimedia.org/r/749271

Change 749269 abandoned by Umherirrender:

[mediawiki/extensions/ProofreadPage@master] build: Update eslint-config-wikimedia to 0.21.0

Reason:

Update of LockFile version does not work on my setting for this repo due to use of explicit git commit

https://gerrit.wikimedia.org/r/749269

Change 757411 had a related patch set uploaded (by Kosta Harlan; author: Kosta Harlan):

[integration/quibble@master] Run post-dependency install, pre-test steps in parallel

https://gerrit.wikimedia.org/r/757411

Change 757411 merged by jenkins-bot:

[integration/quibble@master] Run post-dependency install, pre-test steps in parallel

https://gerrit.wikimedia.org/r/757411

Change 758952 had a related patch set uploaded (by Kosta Harlan; author: Kosta Harlan):

[integration/quibble@master] release: Quibble 1.4.0

https://gerrit.wikimedia.org/r/758952

Change 758952 merged by jenkins-bot:

[integration/quibble@master] release: Quibble 1.4.0

https://gerrit.wikimedia.org/r/758952

Change 763559 had a related patch set uploaded (by Hashar; author: Hashar):

[integration/config@master] jjb: update Quibble jobs from 1.3.0 to 1.4.0

https://gerrit.wikimedia.org/r/763559

Mentioned in SAL (#wikimedia-releng) [2022-02-21T07:31:03Z] <hashar> Switching Quibble jobs from Quibble 1.3.0 to 1.4.0 # T300340 T291549 T225730

Change 763559 merged by jenkins-bot:

[integration/config@master] jjb: update Quibble jobs from 1.3.0 to 1.4.0

https://gerrit.wikimedia.org/r/763559

Change 767749 had a related patch set uploaded (by Hashar; author: Hashar):

[integration/config@master] jjb: update Quibble jobs from 1.3.0 to 1.4.3

https://gerrit.wikimedia.org/r/767749

Change 767749 merged by jenkins-bot:

[integration/config@master] jjb: update Quibble jobs from 1.3.0 to 1.4.3

https://gerrit.wikimedia.org/r/767749

Change 768068 had a related patch set uploaded (by Krinkle; author: Krinkle):

[integration/config@master] Revert "zuul: Install MobileFrontend when testing Echo"

https://gerrit.wikimedia.org/r/768068

Change 768068 merged by jenkins-bot:

[integration/config@master] Revert "zuul: Install MobileFrontend when testing Echo"

https://gerrit.wikimedia.org/r/768068

Change 771429 had a related patch set uploaded (by Krinkle; author: Aaron Schulz):

[mediawiki/core@master] phpunit: Set $wgSQLMode from DevelopmentSettings instead of MediaWikiIntegrationTestCase

https://gerrit.wikimedia.org/r/771429

Change 771429 merged by jenkins-bot:

[mediawiki/core@master] phpunit: Set $wgSQLMode from DevelopmentSettings instead of MediaWikiIntegrationTestCase

https://gerrit.wikimedia.org/r/771429

Change 771429 merged by jenkins-bot:

[mediawiki/core@master] phpunit: Set $wgSQLMode from DevelopmentSettings instead of MediaWikiIntegrationTestCase

https://gerrit.wikimedia.org/r/771429

FTR, this is currently being reverted as the probable cause of T304625: CI failing with IndexPager::buildQueryInfo error: 'wikidb.unittest_globaluser.gu_id' isn't in GROUP BY. (As I understand it, the intention was to enable the SQL mode for integration tests in a better way, but it was also enabled for browser tests because they also use DevelopmentSettings. And I guess there’s some code that breaks under strict SQL mode, and which is reached by the browser tests but not by the PHPUnit integration tests.)

Change 763333 had a related patch set uploaded (by Kosta Harlan; author: Kosta Harlan):

[integration/quibble@master] Logging: Use MWLoggerDefaultSpi

https://gerrit.wikimedia.org/r/763333

Change 763333 abandoned by Kosta Harlan:

[integration/quibble@master] Logging: Use MWLoggerDefaultSpi

Reason:

moving to core

https://gerrit.wikimedia.org/r/763333

Change 774409 had a related patch set uploaded (by Kosta Harlan; author: Kosta Harlan):

[mediawiki/core@master] DevelopmentSettings: Use MWLoggerDefaultSpi for debug logging

https://gerrit.wikimedia.org/r/774409

Change 771678 had a related patch set uploaded (by Krinkle; author: Aaron Schulz):

[mediawiki/core@master] phpunit: Fix slow testBotPasswordThrottled by lowering limits

https://gerrit.wikimedia.org/r/771678

Change 771678 merged by jenkins-bot:

[mediawiki/core@master] phpunit: Fix slow testBotPasswordThrottled by lowering limits

https://gerrit.wikimedia.org/r/771678

Change 777006 had a related patch set uploaded (by Krinkle; author: Aaron Schulz):

[mediawiki/core@master] phpunit: Set $wgSQLMode from DevelopmentSettings instead of MediaWikiIntegrationTestCase (ii)

https://gerrit.wikimedia.org/r/777006

Change 777006 merged by jenkins-bot:

[mediawiki/core@master] phpunit: Set $wgSQLMode from DevelopmentSettings instead of MediaWikiIntegrationTestCase (ii)

https://gerrit.wikimedia.org/r/777006

Change 777482 had a related patch set uploaded (by Krinkle; author: Krinkle):

[mediawiki/core@master] debug: Fix $wgDebugRawPage to work with PSR-3 debug logging

https://gerrit.wikimedia.org/r/777482

Change 777482 merged by jenkins-bot:

[mediawiki/core@master] debug: Fix $wgDebugRawPage to work with PSR-3 debug logging

https://gerrit.wikimedia.org/r/777482

This is really getting frustrating for the wmf branches. E.g. gerrit 785944 spent an hour in CI, then errored out with Build timed out (after 60 minutes). Marking the build as failed.. gerrit 785941 took 92 minutes to merge. It's basically getting impossible to do an extension backport within the one-hour deploy window; not to mention multiple backports.

Can we just drop Selenium tests from gate-and-submit-wmf? I don't think they add any real value, by the point a patch gets there it typically passed Selenium tests in test, gate-and-submit, test-wmf.

I don't think we should run Selenium on anything other than the master branch tbh.

This is really getting frustrating for the wmf branches. E.g. gerrit 785944 spent an hour in CI, then errored out with Build timed out (after 60 minutes). Marking the build as failed.. gerrit 785941 took 92 minutes to merge. It's basically getting impossible to do an extension backport within the one-hour deploy window; not to mention multiple backports.

Can we just drop Selenium tests from gate-and-submit-wmf? I don't think they add any real value, by the point a patch gets there it typically passed Selenium tests in test, gate-and-submit, test-wmf.

Filed as T307180: Drop Selenium tests from gate-and-submit-wmf, +1 from me.

This is really getting frustrating for the wmf branches. E.g. gerrit 785944 spent an hour in CI, then errored out with Build timed out (after 60 minutes). Marking the build as failed.. gerrit 785941 took 92 minutes to merge. It's basically getting impossible to do an extension backport within the one-hour deploy window; not to mention multiple backports.

If I remember properly, those build froze at npm ci step until Jenkins times out the build and kill it. We can probably lower the build timeout in Jenkins and find a way for Quibble to enforce a timeout to npm ci operation. That would at least make them fails faster.

As for the issue, that seems to match a npmjs issue https://status.npmjs.org/incidents/ljzb0hdg8zr3 :

There was an intermittent issue with package install since 23rd April which is now resolved.
Posted Apr 26, 2022 - 14:30 UTC

In T304114 I was made aware that we're running Minerva Selenium tests on Extension:StopForumSpam which is not even run in production (Beta cluster only as far as I can tell) and never interacts with Minerva.

It also seems we run Minerva browser tests in Vector despite the fact it's impossible for a Vector patch to break Minerva as these are mutually exclusive experiences.

It seems we could save a chunk of time by restricting the extensions where we run our browser tests. The overlap between skins and extensions from the perspective of browser tests is very small.
Is there a way to be more deliberate about where we run which Selenium tests? As far as I can see, Minerva browser tests only need to be run with core, MobileFrontend, VisualEditor and Echo changes.