Page MenuHomePhabricator

Investigate how to run webdriverio/mediawiki on a slow machine/instance to easier find flakey tests
Closed, ResolvedPublic5 Estimated Story PointsSpike

Description

Let's try to run webdriver.io test and mediawiki (or one of those) on a slow machine/instance. The idea is that if we can slow down the CPU, making tests run slower, tests that are flakey/have timing issues will start to fail. It could be a way for us to stress test our tests for flakiness.

Let's spend a day to try it out. It could be as simple as setting a limit CPU limit on mediawiki-quickstart. Or maybe a dedicated machine.

Do not spend more than 8 hours on the test as first step.

AC:

  • Try out different ideas and document the result and what you do in this task
  • If this seems to be a useful, create a follow up task to implement it in our work flow

Event Timeline

Restricted Application changed the subtype of this task from "Task" to "Spike". · View Herald TranscriptOct 17 2025, 6:57 AM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
zeljkofilipin renamed this task from Investigate how to run webdriver.io/mediawiki on a slow machines/instances to easier find flakey tests to Investigate how to run webdriverio/mediawiki on a slow machines/instances to easier find flakey tests.Nov 20 2025, 3:05 PM
Peter renamed this task from Investigate how to run webdriverio/mediawiki on a slow machines/instances to easier find flakey tests to Investigate how to run webdriverio/mediawiki on a slow machine/instance to easier find flakey tests.Nov 20 2025, 5:12 PM
SDunlap set the point value for this task to 5.

We discussed this in the meeting yesterday, let me write a suggestion how this can be done:

  • If you have a slow computer you can use this as a POC to see that we can get test to fail. Run core tests but also some tests from a extension (find someone that seems flakey, you can find extensions here.
  • Then the next step (or the first step if you don't have a slow computer ready) I would try to use Docker to slow down MediaWiki and or the container that runs the wdio test. Potentially this could be done in quickstart with some hacking.

If we can get tests to fail by making it slower, then we know that this could be a way forward. If we don't get them to fail, then we don't need to spend on this idea.

Using this config legend to test Core results on a local machine:

Config IDCPU CoresRAM (GB)
CI22
C224
C326
C428
C542
C644
C746
C848
C962
C1064
C1166
C1268
C1382
C1484
C1586
C1688

I get these results:

Test IDRepoPhaseEnv DetailsConfig IDCPU CoresRAM (GB)Max InstancesTotal TestsPassedFailedDurationMax Instance 4
T001CoreQuickstartMacBook M1 MaxCI221109113:13 min8:54 min
T002CoreQuickstartMacBook M1 MaxC2241109113:35 min7:57 min
T003CoreQuickstartMacBook M1 MaxC3261109113:09  min7:29 min
T004CoreQuickstartMacBook M1 MaxC4281109113:11 min08:28 min
T005CoreQuickstartMacBook M1 MaxC5421101008:07 min4:13 min
T006CoreQuickstartMacBook M1 MaxC6441101007:44 min4:58 min
T007CoreQuickstartMacBook M1 MaxC7461101007:43 min3:55 min
T008CoreQuickstartMacBook M1 MaxC8481101007:34 min3:54 min
T009CoreQuickstartMacBook M1 MaxC96211010008:04 min3:52 min
T010CoreQuickstartMacBook M1 MaxC106411010007:45 min3:36 min
T011CoreQuickstartMacBook M1 MaxC11661101007:28 min3:35 min
T012CoreQuickstartMacBook M1 MaxC12681101007:22 min3:20 min
T013CoreQuickstartMacBook M1 MaxC13821101007:53 min3:53 min
T014CoreQuickstartMacBook M1 MaxC14841101007:49 min3:36 min
T015CoreQuickstartMacBook M1 MaxC15861101007:37 min3:33 min
T016CoreQuickstartMacBook M1 MaxC16881101007:09 min3:18 min

Below are the failures that happened when running Core test:

Test IDFailed Test NameError MessageError TypeCPU CoresRamConfig IDMax InstanceReproducibleRetry Result
T001should be able to block a userelement (".cdx-menu-item--enabled") still not clickable after 10000msTimeout22C11 & 4YesFail
T002should be able to block a userelement (".cdx-menu-item--enabled") still not clickable after 10000msTimeout24C21 & 4YesFail
T003should be able to block a useelement (".cdx-menu-item--enabled") still not clickable after 10000msTimeout26C31 & 4YesFail
T004should be able to block a useelement (".cdx-menu-item--enabled") still not clickable after 10000msTimeout28C41 & 4YesFail

TwoColConflict seems to be the most flaky extension according to this data. Running tests using the above config legend will help us see if we can get the tests to fail with the same common failures.

I created a script that automatically runs tests under different RAM and CPU configurations using Quickstart, with the ability to test various max instances. The script is not perfect, but from initial runs, I have noticed a few details.

For MediaWiki Core, with resources throttled down, selenium tests seem to be more stable. Even when running multiple max instances with limited resources, few errors were noticed. Here is a sample report generated from testing Core using the script instead of manually running the tests:

Selenium Test Results - Resource Configuration Analysis

Generated: 2025-12-06
Environment: MacBook M1
Test Path: Core tests
Max Instances Tested: 1,3,4

Test Summary

Test IDRepoPhaseEnv DetailsConfig IDCPU CoresRAM (GB)Total TestsPassedFailedMax Instance: 1Max Instance: 3Max Instance: 4
T001CoreQuickstartMacBook M1C2241010003:53 min03:17 min03:23 min
T002CoreQuickstartMacBook M1C3261010003:38 min03:24 min03:17 min
T003CoreQuickstartMacBook M1C4281010003:40 min03:14 min02:42 min
T004CoreQuickstartMacBook M1C6441010002:28 min01:48 min02:23 min
T005CoreQuickstartMacBook M1C7461010002:25 min02:10 min01:44 min
T006CoreQuickstartMacBook M1C8481010002:28 min02:02 min01:45 min
T007CoreQuickstartMacBook M1C10641010002:22 min01:38 min01:58 min
T008CoreQuickstartMacBook M1C11661010002:56 min01:34 min01:30 min
T009CoreQuickstartMacBook M1C12681010002:41 min02:06 min01:16 min
T010CoreQuickstartMacBook M1C14841010002:18 min01:52 min01:30 min
T011CoreQuickstartMacBook M1C15861010002:24 min01:19 min02:22 min
T012CoreQuickstartMacBook M1C16881010002:24 min01:15 min01:41 min

Failure Details

Test IDFailed Test NameError MessageError TypeCPU CoresRamConfig IDMax InstanceReproducibleRetry Result
T011Page.Could not login: Unknown reasonError86C154YesFail
T011User.Could not login: Unknown reasonError86C154YesFail

Analysis

  • Total Configurations Tested: 12

For extensions, I tested out TwoColConflict, which is one of the repos that seem to have the most flaky tests. When running a max instance of 1, TwoColConflict Selenium tests seem to be stable with few to no errors, regardless of resources available. Example test run:

Selenium Test Results - Resource Configuration Analysis

Generated: 2025-12-06 15:41:15
Environment: MacBook M1
Test Path: extensions/TwoColConflict
Max Instances Tested: 1

Test Summary

Test IDRepoPhaseEnv DetailsConfig IDCPU CoresRAM (GB)Total TestsPassedFailedMax Instance: 1
T001TwoColConflictQuickstartMacBook M1C22477003:36 min
T002TwoColConflictQuickstartMacBook M1C32677003:48 min
T003TwoColConflictQuickstartMacBook M1C42877003:42 min
T004TwoColConflictQuickstartMacBook M1C64477003:06 min
T005TwoColConflictQuickstartMacBook M1C74677003:04 min
T006TwoColConflictQuickstartMacBook M1C84877003:03 min
T007TwoColConflictQuickstartMacBook M1C106477003:03 min
T008TwoColConflictQuickstartMacBook M1C116677003:08 min
T009TwoColConflictQuickstartMacBook M1C126877003:02 min
T010TwoColConflictQuickstartMacBook M1C148477003:02 min
T011TwoColConflictQuickstartMacBook M1C158677003:02 min
T012TwoColConflictQuickstartMacBook M1C168877003:00 min

Analysis

  • Total Configurations Tested: 12

When the max instances value is increased beyond one for TwoColConflict, the tests become flaky, and we start seeing numerous errors. I also tested CampaignEvents, and the results were the same, tests pass with one max instance but fail with more than one.
We should investigate why Core Selenium tests remain stable when running multiple instances while extension tests fail under the same conditions. This may be due to a configuration difference, test structure difference, or even how the tests are written. If we resolve this, I believe we can significantly improve the robustness of the tests, which would allow us to enable multiple instances across all repositories and reduce wait times. The script generated report is a bit lengthy, but here are the TwoColConflict selenium test results for an example, no failures on 1 instance, but lots of failures on >1 instances:

Selenium Test Results - Resource Configuration Analysis

Generated: 2025-12-06 19:04:09
Environment: MacBook M1
Test Path: extensions/TwoColConflict
Max Instances Tested: 1,2,3,4

Test Summary

Test IDRepoPhaseEnv DetailsConfig IDCPU CoresRAM (GB)Total TestsPassedMax Instance: 1Max Instance: 2Max Instance: 3Max Instance: 4
T001TwoColConflictQuickstartMacBook M1C2247703:44 min05:47 min04:10 min04:55 min
T002TwoColConflictQuickstartMacBook M1C3267703:41 min04:57 min04:55 min04:42 min
T003TwoColConflictQuickstartMacBook M1C4287703:42 min05:15 min04:47 min05:00 min
T004TwoColConflictQuickstartMacBook M1C6447703:05 min03:55 min02:37 min02:46 min
T005TwoColConflictQuickstartMacBook M1C7467703:04 min04:32 min03:51 min03:47 min
T006TwoColConflictQuickstartMacBook M1C8487703:04 min03:41 min04:03 min03:04 min
T007TwoColConflictQuickstartMacBook M1C10647703:03 min03:49 min04:18 min03:37 min
T008TwoColConflictQuickstartMacBook M1C11667703:04 min03:56 min03:28 min02:49 min
T009TwoColConflictQuickstartMacBook M1C12687703:02 min03:36 min03:37 min03:11 min
T010TwoColConflictQuickstartMacBook M1C14847703:02 min03:48 min02:40 min03:28 min
T011TwoColConflictQuickstartMacBook M1C15867703:01 min04:37 min04:09 min03:18 min
T012TwoColConflictQuickstartMacBook M1C16887703:00 min03:52 min03:16 min04:10 min

Failure Details

Test IDFailed Test NameError MessageError TypeCPU CoresRamConfig IDMax InstanceReproducibleRetry Result
T001TwoColConflict collapse button.Cannot submit login formError24C22YesFail
T001TwoColConflict GuidedTour.Cannot submit login formError24C24NoFail
T001TwoColConflict.on talk page conflicts.invalidjson: No valid JSON responseError24C24NoFail
T002TwoColConflict collapse button.Cannot submit login formError26C32YesPass
T002TwoColConflict.on talk page conflicts.stores correct merge when swapped and editedNode is either not clickable or not an HTMLElementError26C32YesFail
T002TwoColConflict.on talk page conflicts.invalidjson: No valid JSON responseError26C34YesFail
T002TwoColConflict GuidedTour.on subsequent view.Failed to wait for ext.TwoColConflict.SplitJs to be ready after 5000Error26C34YesFail
T002TwoColConflict GuidedTour.invalidjson: No valid JSON responseError26C34NoFail
T002TwoColConflict save and preview.should save a resolved conflict successfully when another user edits a different section in the meantimeFailed to wait for mediawiki.base to be ready after 5000 ms.Error26C34NoPass
T003TwoColConflict EditUi.Evaluation failed: httpError28C42YesFail
T003TwoColConflict without JavaScript.handles order selection on the talk page version correctlyinvalidjson: No valid JSON responseError28C44NoFail
T003TwoColConflict save and preview.should be possible to edit and preview the left (invalidjson: No valid JSON responseError28C44YesFail
T003TwoColConflict GuidedTour.on subsequent view.invalidjson: No valid JSON responseError28C44NoFail
T004TwoColConflict EditUi.Cannot submit login formError44C62YesPass
T004TwoColConflict.Can't call setValue on element with selector "#wpName1" because elemError44C63YesFail
T004TwoColConflict GuidedTour.Cannot submit login formError44C64YesFail
T004TwoColConflict GuidedTour.Evaluation failed: httpError44C64NoFail
T004TwoColConflict.labels change according to selected columninvalidjson: No valid JSON responseError44C64NoPass
T004TwoColConflict save and preview.should be possible to edit and preview the left (invalidjson: No valid JSON responseError44C64YesPass
T004TwoColConflict save and preview.should save a resolved conflict successfully when another user edits a different section in the meantimeFailed to wait for mediawiki.base to be ready after 5000 ms.Error44C64NoFail
T005TwoColConflict EditUi.Cannot submit login formError46C72YesFail
T005TwoColConflict without JavaScript.handles order selection on the talk page version correctlyinvalidjson: No valid JSON responseError46C72YesFail
T005TwoColConflict.shows the talk page screen on conflicts that also add new linesinvalidjson: No valid JSON responseError46C72YesFail
T005TwoColConflict.on talk page conflicts.invalidjson: No valid JSON responseError46C72NoFail
T005TwoColConflict without JavaScript.handles order selection on the talk page version correctlyinvalidjson: No valid JSON responseError46C74NoFail
T005TwoColConflict.editor should not decode html entitiesinvalidjson: No valid JSON responseError46C74YesFail
T006TwoColConflict EditUi.Evaluation failed: httpError48C82YesPass
T006TwoColConflict save and preview.should save a resolved conflict successfully when another user edits a different section in the meantimeinvalidjson: No valid JSON responseError48C84YesPass
T006TwoColConflict save and preview.should be possible to edit and preview the left (invalidjson: No valid JSON responseError48C84NoFail
T007TwoColConflict EditUi.Evaluation failed: httpError64C102YesPass
T007TwoColConflict without JavaScript.handles order selection on the talk page version correctlyinvalidjson: No valid JSON responseError64C102YesFail
T007TwoColConflict save and preview.should trigger a new conflict when another user edits in the same lines in the meantimeinvalidjson: No valid JSON responseError64C104YesFail
T007TwoColConflict without JavaScript.handles order selection on the talk page version correctlyinvalidjson: No valid JSON responseError64C104YesFail
T007TwoColConflict.editor should not decode html entitiesinvalidjson: No valid JSON responseError64C104YesFail
T008TwoColConflict EditUi.Cannot submit login formError66C112YesPass
T008TwoColConflict GuidedTour.on subsequent view.invalidjson: No valid JSON responseError66C114YesPass
T009TwoColConflict.on talk page conflicts.invalidjson: No valid JSON responseError68C124YesFail
T009TwoColConflict GuidedTour.on subsequent view.invalidjson: No valid JSON responseError68C124YesFail
T009TwoColConflict save and preview.should be possible to edit and preview the left (internal_api_error_DBQueryError: [e6b410a045fd6ffdc57907ca] ExceptioError68C124YesFail
T009TwoColConflict GuidedTour.on subsequent view.invalidjson: No valid JSON responseError68C124NoFail
T010TwoColConflict collapse button.Cannot submit login formError84C142YesFail
T011TwoColConflict.editor should not decode html entitiesinvalidjson: No valid JSON responseError86C154YesFail
T011TwoColConflict save and preview.should save a resolved conflict successfully when another user edits a different section in the meantimeinvalidjson: No valid JSON responseError86C154YesPass
T011TwoColConflict.editor should not decode html entitiesinvalidjson: No valid JSON responseError86C154NoFail
T012TwoColConflict collapse button.Evaluation failed: httpError88C162YesFail
T012TwoColConflict save and preview.Cannot submit login formError88C162NoFail
T012TwoColConflict.on talk page conflicts.invalidjson: No valid JSON responseError88C164NoFail
T012TwoColConflict.editor should not decode html entitiesinvalidjson: No valid JSON responseError88C164YesFail

Analysis

  • Total Configurations Tested: 12