Page MenuHomePhabricator

CodeMirror enabled -> opens with the CodeMirror view is flaky
Open, Needs TriagePublic

Description

The following test is flaky:

CodeMirror enabled VisualEditor 2017 wikitext editor ->opens with the CodeMirror view displayed and focus set on the VE surface

This has flaked 11 times recently, making it appear on the Flaky test report.

The location of the test is here:

https://integration.wikimedia.org/ci/job/quibble-vendor-mysql-php81-selenium/44937/

The purpose of this ticket:

  • Investigate the test failure and flakiness, see if it can be reproduced locally.
  • Depending on the cause of the flakiness, implement the resolution.

Event Timeline

Change #1231015 had a related patch set uploaded (by Vaughn Walters; author: Vaughn Walters):

[mediawiki/extensions/CodeMirror@master] DNM selenium: Testing for flake in CI pipeline

https://gerrit.wikimedia.org/r/1231015

vaughnwalters subscribed.

I am unable to reproduce this flakiness on Local or in CI.

Local: Passed 100/100

In CI: Passed 200/200

I am not yet sure what to do with this one because it has flaked fairly recently (8 days ago) but I can't reproduce the failure at all now.

Here is the full log of one of the 100 runs in CI, and if that disappears the full output is here: P87896

Further update on this. The previous 200 times in CI, I ran this test in isolation. Then I ran it a further 100 times with the rest of the tests in highlighting-wikitext2017.js which is the file containing the flaky test, and it still passed 100/100 times.

I then ran the entire test suite 100 times in CI to see if there was something that may have been overlapping between tests which caused it to flake (tests not isolated enough) and the flake started to appear. I think it is because there are four instances running in parallel and some of the tests are for wikitext2017 and some are for wikitext2010. Moving this back to in progress.

Takeaway here is that if we are running tests in parallel, we need to be aware if user preferences are changed during or before a test as they could interfere with each other. For example, some tests required the 2010 editor and some required the 2017 editor, and when those user preferences were selected and the tests were all run at the same time in parallel, it produced flaky tests. We could either 1. run the tests sequentially (turn max instances down to 1) or we could 2. run all the tests that require the 2010 editor in parallel, and then after run all of the tests that require the 2017 editor in parallel. So could still parallelize tests, but not have the user preference changes interfere with each other. I think option two is the more robust choice, because it will still save some time by partial parallelization, but also reduce flake.

Also... I wonder how much the parallelization of these tests adds to CI load? Wondering if that would slow CI down enough to create test flake. When I ran these fully in serial, tests pass 100/100. But when I ran in groups of tests (2017 tests in parallel, and 2010 tests in parallel) there was some instance of flake because of timing issues. ... And I am wondering if this is related to CI load?

My tl;dr take is after time debugging this in the pipeline ... run suites in serial and drop maxInstances to 2. If there is further flake, we should just run all these in serial (drop maxInstances to 1 and remove the "suite" in wdio.conf.js. If there is a job that HAS to be optimized for time (because we run it in the gate for example) then we can spend more time and effort on figuring out why flake is caused by parallelization.

Change #1231015 merged by jenkins-bot:

[mediawiki/extensions/CodeMirror@master] selenium: Decreasing flake by serializing tests suites and dropping maxInstances to 2

https://gerrit.wikimedia.org/r/1231015