Page MenuHomePhabricator

RFC: Let's stop running QUnit unit tests as integration tests
Open, NormalPublic

Description

Summary

Currently the JavaScript unit tests for all extension and skin repos are run together on every single commit that enters the integration pipeline. We basically use QUnit for integration testing. Yet Selenium (despite having its own problems) is meant for integration testing and has been adopted in many repos. To make QUnit usable for integration testing we load a MediaWiki instance and launch a web browser which runs all the unit tests in Special:JavaScript/qunit

The proposal here is to stop this practice and move towards a setup where JavaScript unit tests are forced to be written in such a way that they:

  1. are run in isolation without side effects from unit tests in other extensions.
  2. they can be run from the command line quicker, without any setup steps (MediaWiki, LocalSettings edits)

Given discussion on this ticket, we can retain the Special:JavaScriptTest/qunit/plain mode for true integration tests (which will hopefully run a much smaller set of tests).

Problem statement

I argue that our current usage of QUnit in this way is problematic for several reasons.

  1. Most of the tests run in QUnit are not integration tests. In fact, this is a nuisance to most developers. Currently a badly written unit test that touches global variables can break another unrelated extension. This happens frequently and can grind all JavaScript development to a halt for a day in certain circumstances. E..g T214804 (but countless more examples) ci-test-error (T218172)
  2. Running QUnit tests in this way is not catching real integration errors. Most integration errors are being uncovered these days using Selenium
  3. The fact we have to launch a browser and MediaWiki instance makes these jobs slow, particularly when run locally on an install with multiple extensions needed to function. In MobileFrontend headless QUnit tests run in 10s, compared to 2 minutes on Jenkins (when all extensions are also run)
  4. The way we run QUnit is incompatible with most Node.js tooling. It is very difficult to add code coverage for JS when we run tests in this way.
  5. The current infrastructure limits what we can test. To test a function it has to be made publicly accessible which usually involves exposing it unnecessarily on the mw global object, which adds unnecessary noise.

Implementation proposal

To do this we would follow the path taken in MobileFrontend and Popups. MobileFrontend is currently running its unit tests in two modes - a local npm run test:unit and the current Special:JavaScript/qunit/plain method. We use webpack to build the tests for the latter environment and would live to standardise (T203137) and remove it.

  • Code would be rewritten to use module.exports and requires. This requires either the adoption of webpack in complicated extensions or making use of the ResourceLoader improvements available off the back of T133462. I am focusing on the latter approach and have a proof of concept describing a backwards compatible migration. See: https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/487168/
  • The majority of tests in MediaWiki extensions/core would gradually be rewritten using a new node library mw-node-qunit which can be run inside npm test which will be provided by the reading web team and provide the minimal mediawiki JS environment needed for testing. It will force the user to write tests in a sane way, by forcing them to stub calls to mw.msg and mw.config amongst other things that can impact the global state and have side effects.
  • It should be possible for the tests to be run inside real browsers via Karma as well as in Node.js.
  • The current test runner is kept for running true integration tests in QUnit where Selenium is not an option.

Q and A / talking points

Would Special:JavaScript/qunit disappear altogether?
Maybe. There might be value in using qunit for a small set of unit tests. I would argue that Selenium despite it's problems would be a better approach for ensuring compatibility between different extensions.

Wouldn't we lose protection against browser specific bugs?
I am not aware of any situations we have caught a regression by running our unit tests in different browsers e.g. Firefox/Chrome/Internet Explorer. While a nice idea, I don't see any evidence that we would catch any issues. It seems hypothetical but not practical. The majority of issues in the mobile website are caught via Selenium (I can collect data if we really need this). Selenium seems a much better way to catch these problems. Unit tests tend to prevent errors entering the codebase and protecting known errors from reappearing in the codebase. Unit tests for example did not help us prevent a recent regression in mobile T215536 while Selenium did.

[1] If necessary, we can retain the use of QUnit for integration testing, but only for a small and limited set of tests.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptDec 21 2018, 6:09 PM
Jdlrobson updated the task description. (Show Details)Dec 21 2018, 6:09 PM
Jdlrobson updated the task description. (Show Details)
Jdlrobson updated the task description. (Show Details)
Jdlrobson updated the task description. (Show Details)Jan 27 2019, 8:24 PM

Change 487168 had a related patch set uploaded (by Jdlrobson; owner: Jdlrobson):
[mediawiki/core@master] WIP/POC: Headless Qunit tests (oh my!)

https://gerrit.wikimedia.org/r/487168

kostajh added a subscriber: kostajh.
Jdlrobson updated the task description. (Show Details)Feb 7 2019, 11:22 PM
Jdlrobson triaged this task as Normal priority.Feb 7 2019, 11:37 PM
Jdlrobson updated the task description. (Show Details)

I am not aware of any situations we have caught a regression by running our unit tests in different browsers e.g. Firefox/Chrome/Internet Explorer.

The VisualEditor code history is littered with these. (As is OOjs's more aggressive use of SauceLabs.) Not that that changes the necessity in general, but…

I think this is a good direction, but I'm skeptical that we can, without a lot more effort, get rid of browser QUnit tests. A number of them depend on the MediaWiki environment (or literally on being in a browser) and it can be difficult or cumbersome to mock the MW environment and/or the browser environment (including the DOM) faithfully enough to make these tests work.

That said, I definitely agree that we should run tests in node where we can and only run them in the browser where we have to, and I think your work in this direction is really exciting.

The VisualEditor code history is littered with these. (As is OOjs's more aggressive use of SauceLabs.) Not that that changes the necessity in general, but…

Are you talking about QUnit or Selenium here? I wasn't aware that we had CI infrastructure for running QUnit tests in other web browsers and am not aware of any QUnit/SauceLab integration. Can you point to some example regressions they have caught?

Jdlrobson updated the task description. (Show Details)Feb 14 2019, 8:29 PM

The VisualEditor code history is littered with these. (As is OOjs's more aggressive use of SauceLabs.) Not that that changes the necessity in general, but…

Are you talking about QUnit or Selenium here? I wasn't aware that we had CI infrastructure for running QUnit tests in other web browsers and am not aware of any QUnit/SauceLab integration. Can you point to some example regressions they have caught?

QUnit, via node.

For VisualEditor, it's configured at https://gerrit.wikimedia.org/r/plugins/gitiles/VisualEditor/VisualEditor/+/master/Gruntfile.js#386 (though I believe we're only running Chrome not Firefox right now).

For OOjs, it's configured at https://gerrit.wikimedia.org/r/plugins/gitiles/oojs/core/+/master/Gruntfile.js#94

Ed, Timo, and Roan are the experts of our epic struggles with browser compatibility.

daniel added a subscriber: daniel.Feb 19 2019, 1:51 PM

I think this is a good direction, but I'm skeptical that we can, without a lot more effort, get rid of browser QUnit tests. A number of them depend on the MediaWiki environment (or literally on being in a browser) and it can be difficult or cumbersome to mock the MW environment and/or the browser environment (including the DOM) faithfully enough to make these tests work.

Without looking at the code, this seems to indicate that the code itself is poorly isolated, and could benefit from refactoring. In other words: this may not be a reason to stick with the old test system, but rather a reason to refactor the code to make it more testable (by reducing dependencies, which also makes it more reusable, more stable, and easier to understand).

daniel moved this task from Inbox to Under discussion on the TechCom-RFC board.Feb 19 2019, 1:52 PM

I think this is a good direction, but I'm skeptical that we can, without a lot more effort, get rid of browser QUnit tests. A number of them depend on the MediaWiki environment (or literally on being in a browser) and it can be difficult or cumbersome to mock the MW environment and/or the browser environment (including the DOM) faithfully enough to make these tests work.

Without looking at the code, this seems to indicate that the code itself is poorly isolated, and could benefit from refactoring. In other words: this may not be a reason to stick with the old test system, but rather a reason to refactor the code to make it more testable (by reducing dependencies, which also makes it more reusable, more stable, and easier to understand).

Indeed. In fact while migrating one file to the new pattern in the example I made the test better by identifying these implied (but not specified) dependencies on MediaWiki and made them explicit. I think not only would this effort make these existing tests better but lead to sensible refactoring and the introduction of new tests for things that previously were not testable.

FWIW the stubbing in https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/487168/9/tests/qunit/suites/resources/mediawiki/mediawiki.Title.test.js of mw.util, mw.config minimises the chances of false positives of these tests due to differently configured MediaWiki instance.

I think this is valuable work, and something I'm keen to lead on if this direction is accepted.

Ed, Timo, and Roan are the experts of our epic struggles with browser compatibility.

Once all tests are headless there is nothing stopping us from running in a browser the same way as we currently do IMO. We'd likely need to compile the JS first before sending to the browser (maybe as HTML file with inline script), but that seems doable.

Jdlrobson moved this task from Next up to Blocked on the User-Jdlrobson board.Feb 28 2019, 8:59 PM

TechCom is hosting a discussion of this on March 20 2:45pm PST (21:45 UTC, 21:45 CET) in #wikimedia-office

Tgr added a subscriber: Tgr.Mar 11 2019, 7:50 AM

Selenium tests are slow, fragile, hard to maintain and hard to understand. Using it for integration tests would just make testing harder.

Rewriting most of our tests for an isolated, node-based test runner would definitely be a big improvement (ideally you want a "test pyramid" with lots of unit tests, less integration tests and few end-to-end tests; in MediaWiki frontend what we call unit tests are mostly integration tests, not properly isolated from each other or the environment, and that should be fixed). IMO it would be better to focus on that instead of removing things that still have some use (integration tests are often needed to test complex interplay with MediaWiki configuration or DOM, or between multiple classes, or for regression tests for browser bugs).

Jdlrobson updated the task description. (Show Details)Mar 14 2019, 3:19 PM

There is additional relevant background presented in https://phabricator.wikimedia.org/phame/post/view/96/fast_and_isolated_js_unit_tests/.

Anecdotally, I don't think any contributors to Popups have longed for Special:JavaScriptTests. I too am personally unaware of any bugs identified that would have occurred in Special:JavaScriptTests but not headless Node.js.

Integration testing is possible in headless Node.js QUnit tests. We test production sources for jQuery and OOUI in MobileFrontend using jsdom. For me, this RFC is more about providing a fast, isolated, and consistent environment for testing but we can use it however we like.

Selenium tests are slow, fragile, hard to maintain and hard to understand. Using it for integration tests would just make testing harder.

+1.

Once all tests are headless there is nothing stopping us from running in a browser the same way as we currently do IMO. We'd likely need to compile the JS first before sending to the browser (maybe as HTML file with inline script), but that seems doable.

We would want to be able to run tests in the browser where they are easy to debug. As James mentioned earlier we sometimes need to run the tests in different browsers to test different behaviour. If the proposal involes adding build step then I feel that would also make development slower.

I also think we need to clarify that we really have no strict definition for the difference between a unit test and an integration test. If I test the construction of a complex object, like a VisualEditor data-model document, is that a unit test because it is one method, or an integration test because it triggers thousands of method calls in hundreds of classes?

The main difference between Selenium and QUnit is Selenium's ability to generate real user inputs, not to sequence together multiple functional tests into an integration test.

This ability comes at a performance and complexity cost and as a result we would avoid it unless we feel we need to test real user inputs, or a more realistic complete environment.

For me the real question here is how we can mark some of our existing tests as being self-contained unit tests, such that other developers can be confident they don't need to be run when code external to it is changed. I think that question is separate from the question of which testing framework is used, or how it is run.

To summarise what I'm hearing:

  • Unlike we'll get rid of all browser QUnit tests.
  • Selenium is slow
  • Easier to debug certain problems in browser

Proposal:
I suspect we can have the best of both worlds and maintain two places for unit tests with different benefits.
"Let's stop using QUnit as a mechanism for integration tests" => "Let's reduce the amount of QUnit integration tests"

Right now, I'm put off writing tests that run in Special:JavaScript/qunit as they are:

  • slow
  • not isolated (by design) so there is a risk they can and do impact other teams
  • require paperwork to be registered
  • allow me to write bad tests which assume certain environment (e.g. English language, certain extensions setup)
  • don't give me code coverage reports

I think having two ways of running unit tests would be excellent:

  1. headless using Node.js: designed for speed and developer productivity
  2. Via Special:JavaScript/Qunit - Testing integrating with core/other extensions would be useful.

I am already aware of teams interested in this style of testing for their own extensions, so would like to help standardise an approach and conventions to testing in this fashion that makes life easier.

In the IRC meeting tomorrow, I hope to get some clarity on whether this is a sensible direction, what problems there are with this approach (if any) and how we can make this reusable to other extensions (would it make sense to have a single npm module for testing that includes dependencies for Selenium, grunt etc)

If approved I am keen to own this, by setting up the infrastructure, auditing the existing tests in core and separating out things that would be better classed as unit tests.

Just wanted to chime in before today's conversation. As a new developer at WMF with a background in front-end engineering, I found the Special:JavascriptTest approach to unit testing to be pretty cumbersome. Running tests in the browser was slow and resource-intensive, and inevitably some tests from other extensions (which I was not concerned with in local development) would come up as failures and worsen the signal-to-noise ratio. Finally, the traditional approach in ResourceLoader of concatenating all the JS dependencies and relying on objects in the global scope made it hard to test components in isolation, figure out what to mock, and reason about the code generally.

I'm currently working on a patch to refactor some code in the WikibaseMediaInfo extension to use require and module.exports statements since ResourceLoader now supports this syntax. This solves multiple problems in my view: code becomes easier to test in a headless Node environment (and can still be run in CI), and the ability to use a proper module system on the front-end makes JS development easier. I think that this is the approach we will use for front-end unit tests going forward in the Multimedia team.

There are some unsolved problems here though – one of the biggest being the possibility of a divergence in what versions of front-end libraries are used in testing vs production. For example, I need to pull in OOJS and OOUI to test some components, but there is no way to enforce parity between the versions declared as devDependencies in package.json and what gets delivered in the browser by ResourceLoader.

Jdlrobson renamed this task from RFC: Let's stop using QUnit as a mechanism for integration tests to RFC: Let's stop running QUnit unit tests as integration tests.Mar 27 2019, 7:48 PM
Jdlrobson updated the task description. (Show Details)
Jdlrobson updated the task description. (Show Details)Mar 27 2019, 7:51 PM

Change 487168 had a related patch set uploaded (by Jdlrobson; owner: Jdlrobson):
[mediawiki/core@master] WIP/POC: Headless Qunit tests (oh my!)

https://gerrit.wikimedia.org/r/487168

Here is my perspective as one of the developers of the AdvancedSearch extension, which heavily uses OOUI: What is an "integration test"?

  1. Running QUnit tests in this way is not catching real integration errors. Most integration errors are being uncovered these days using Selenium

Our QUnit tests don't rely on MediaWiki being present (no API calls, no calls to mw.*), but they need a DOM being present because they mostly test features we add to existing OOUI classes by creating subclasses. In the past, those tests caught integration errors ("integrating" OOUI) for us and remain an important asset in our testing strategy.

Running our tests from the command line is a goal we have in mind, Change 487168 is a good example, that we might follow. But still, we need to test our "integration" with OOUI, and if possible, be notified of breaking changes early and in an automated fashion. If moving the integration tests out of QUnit means moving them to Selenium, then I'm not in favor if this RFC as is, because that would make the development experience much worse for us - Selenium tests run slower and are more brittle, not to speak of the additional work of converting all our tests. The "Test pyramid" @Tgr mentioned would be turned on its head.

Hi @gabriel-wmde – just wanted to say that writing headless QUnit tests that can incorporate OOUI and access a (virtual) DOM are definitely possible. In the MediaInfo extension we recently took the first step towards writing these kinds of tests. I'm hoping to add a lot more headless unit tests in the near future to do the kinds of things you are talking about – ensure that UI components have the right classes, add/remove necessary elements from the DOM when a method is called, etc.

These tests are still much faster than Selenium-based ones (running in the command line is actually much faster/nicer than relying on Special:JavascriptTest in my opinion). Even though a virtual DOM is being used, I'd still consider these tests to be "unit" tests – the unit we care about is a particular UI element which includes DOM + JS methods.

You can see an example of those tests here. The OOUI library was recently updated to support being used via require in a Node environment, and you can see where it is being loaded into scope for the test suite. The jsdom library is also used here.

Happy to help answer questions if you are interested in trying to build a similar approach in AdvancedSearch.

As mentioned towards the end of our IRC discussion (Log: https://tools.wmflabs.org/meetbot/wikimedia-office/2019/wikimedia-office.2019-03-20-21.45.log.html) you needn't use Node + virtual DOM, as running Chrome headless these days is just as quick. In fact this is how OOUI itself runs tests.

If moving the integration tests out of QUnit means moving them to Selenium, then I'm not in favor if this RFC as is,

The proposal is to keep the existing setup, but migrate tests which are not actually needed as integration tests to a quicker isolated build. I'm not proposing you move all those tests these to Selenium, but I'm suggesting it would be useful to audit those tests and work out which ones really are useful as integration tests, resulting in a much slimmer set of tests that are run across the MediaWiki universe.

I looked into using Karma to run tests with the existing MobileFrontend setup for people that want to have test support in real browsers. We need to use Karma 3.1.4 (4 won't work with our current node 6). POC is here: POC: Show how Karma can be used to run node-qunit tests. I don't think it's something that my team would use but it shows that if running unit tests in real browsers is important to you it can be done.

Using webpack and a build step it's really trivial to get these tests running inside the browser via Karma (MobileFrontend already supports Special:JavaScript/qunit for development purposes) - you just need to pay more attention to fleshing out the MediaWiki+OOjs+JQuery environment and required stubs (something that would be made much easier if all our code was packaged on npm). The build step can be run inside npm test (so doesn't suffer from the problems of T199004)

I suspect you can avoid using webpack and the webpack bundle by using the requirejs plugin for Karma but I didn't explore that.

From my perspective, breaking out tests from Special:JavaScript/qunit/plain that are actually unit tests (not intentional unit tests being used for integration) would lead to faster, more modular and reusable code as well as provide more isolated tests. It also compliment the new packageFiles ResourceLoader construct as it encourages packaging up existing MediaWiki modules in such a way they can be published on NPM.

I'd love to see some kind of standard usage of this in core and encouragement to this style of development.

For what it's worth one of our projects uses requirejs and karma (analytics-dashiki), so if anyone wants to try that out I can show them how to make it work.

If moving the integration tests out of QUnit means moving them to Selenium, then I'm not in favor if this RFC as is,

The proposal is to keep the existing setup, but migrate tests which are not actually needed as integration tests to a quicker isolated build.

How would this be organized in the source? Would there be separate locations for unit tests vs integration tests? This idea has been around a while for phpunit tests, and has gained fresh attention recently: T87781: Split mediawiki tests into unit and integration tests

mobrovac added a subscriber: mobrovac.

Moving to backlog until @Jdlrobson is back.

If moving the integration tests out of QUnit means moving them to Selenium, then I'm not in favor if this RFC as is,

The proposal is to keep the existing setup, but migrate tests which are not actually needed as integration tests to a quicker isolated build.

How would this be organized in the source? Would there be separate locations for unit tests vs integration tests? This idea has been around a while for phpunit tests, and has gained fresh attention recently: T87781: Split mediawiki tests into unit and integration tests

I'm back.I was imagining there would be three folders - node-qunit (tests run against Node.js) and browser, qunit (the status quo). The tests in node-qunit would be run using Node.js and would be expected to work on a vanilla MediaWiki install so cannot make any assumptions about language or installed extensions (stubbing the contracts of these where necessary). The qunit folder would be reserved for tests where it is useful to test integration of those contracts. Does that help?

@Jdlrobson thanks for the clarification, Jon!

We put this ticket in the freezer until you got back. Now, how do you want to procede with it? You can just put it back in "under discussion" (and maybe solicit input on wikitech-l), but you could also request an IRC meeting if you think that would be helpful (move it to the "request IRC meeting" column), or propose for it to go straight to Last Call (we don't have a column for that on the RFC board - best just comment and move it to the inbox, then we'll look at it in the next meeting).