Page MenuHomePhabricator

RFC: Reconsider how we run QUnit unit tests
Open, MediumPublic

Description

Summary

Currently the JavaScript unit tests for all extension and skin repos are run together on every single commit that enters the integration pipeline. We basically use QUnit for integration testing. Yet Selenium (despite having its own problems) is meant for integration testing and has been adopted in many repos. To make QUnit usable for integration testing we load a MediaWiki instance and launch a web browser which runs all the unit tests in Special:JavaScript/qunit

The proposal here is to stop this practice and move towards a setup where JavaScript unit tests are forced to be written in such a way that they:

  1. are run in isolation without side effects from unit tests in other extensions.
  2. they can be run from the command line quicker, without any setup steps (MediaWiki, LocalSettings edits)

Given discussion on this ticket, we can retain the Special:JavaScriptTest/qunit/plain mode for true integration tests (which will hopefully run a much smaller set of tests).

Problem statement

I argue that our current usage of QUnit in this way is problematic for several reasons.

  1. Most of the tests run in QUnit are not integration tests. In fact, this is a nuisance to most developers. Currently a badly written unit test that touches global variables can break another unrelated extension. This happens frequently and can grind all JavaScript development to a halt for a day in certain circumstances. E..g T214804 (but countless more examples) ci-test-error (T218172)
  2. Running QUnit tests in this way is not catching real integration errors. Most integration errors are being uncovered these days using Selenium
  3. The fact we have to launch a browser and MediaWiki instance makes these jobs slow, particularly when run locally on an install with multiple extensions needed to function. In MobileFrontend headless QUnit tests run in 10s, compared to 2 minutes on Jenkins (when all extensions are also run)
  4. The way we run QUnit is incompatible with most Node.js tooling. It is very difficult to add code coverage for JS when we run tests in this way.
  5. The current infrastructure limits what we can test. To test a function it has to be made publicly accessible which usually involves exposing it unnecessarily on the mw global object, which adds unnecessary noise.

Implementation proposal

To do this we would follow the path taken in MobileFrontend and Popups. MobileFrontend is currently running its unit tests in two modes - a local npm run test:unit and the current Special:JavaScript/qunit/plain method.

  • All core code would be rewritten to use module.exports and requires using ResourceLoader packageFiles. I have a proof of concept describing a backwards compatible migration. See: https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/487168/
  • The majority of tests in MediaWiki extensions/core would gradually be rewritten using a new node library mw-node-qunit which would be run inside npm test which will be provided by the reading web team and provide the minimal mediawiki JS environment needed for testing. It will force the user to write tests in a sane way, by forcing them to stub calls to mw.msg and mw.config amongst other things that can impact the global state and have side effects.
  • Code coverage reports for core JavaScript would be published as a result of this project
  • The current test runner is kept for running true integration tests in QUnit
  • If needed, support would be added for running tests inside Special:JavaScript/qunit locally for debugging purposes.

Q and A / talking points

Would Special:JavaScript/qunit disappear altogether?
Maybe. There might be value in using qunit for a small set of unit tests. I would argue that Selenium despite it's problems would be a better approach for ensuring compatibility between different extensions.

Wouldn't we lose protection against browser specific bugs?
I am not aware of any situations we have caught a regression by running our unit tests in different browsers e.g. Firefox/Chrome/Internet Explorer. While a nice idea, I don't see any evidence that we would catch any issues. It seems hypothetical but not practical. The majority of issues in the mobile website are caught via Selenium (I can collect data if we really need this). Selenium seems a much better way to catch these problems. Unit tests tend to prevent errors entering the codebase and protecting known errors from reappearing in the codebase. Unit tests for example did not help us prevent a recent regression in mobile T215536 while Selenium did.

[1] If necessary, we can retain the use of QUnit for integration testing, but only for a small and limited set of tests.

Details

Related Gerrit Patches:
mediawiki/core : masterHeadless QUnit tests (oh my!)
mediawiki/core : masterImprove mediawiki.Title unit tests

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptDec 21 2018, 6:09 PM
Jdlrobson updated the task description. (Show Details)Dec 21 2018, 6:09 PM
Jdlrobson updated the task description. (Show Details)
Jdlrobson updated the task description. (Show Details)
Jdlrobson updated the task description. (Show Details)Jan 27 2019, 8:24 PM

Change 487168 had a related patch set uploaded (by Jdlrobson; owner: Jdlrobson):
[mediawiki/core@master] WIP/POC: Headless Qunit tests (oh my!)

https://gerrit.wikimedia.org/r/487168

kostajh added a subscriber: kostajh.
Jdlrobson updated the task description. (Show Details)Feb 7 2019, 11:22 PM
Jdlrobson triaged this task as Medium priority.Feb 7 2019, 11:37 PM
Jdlrobson updated the task description. (Show Details)

I am not aware of any situations we have caught a regression by running our unit tests in different browsers e.g. Firefox/Chrome/Internet Explorer.

The VisualEditor code history is littered with these. (As is OOjs's more aggressive use of SauceLabs.) Not that that changes the necessity in general, but…

I think this is a good direction, but I'm skeptical that we can, without a lot more effort, get rid of browser QUnit tests. A number of them depend on the MediaWiki environment (or literally on being in a browser) and it can be difficult or cumbersome to mock the MW environment and/or the browser environment (including the DOM) faithfully enough to make these tests work.

That said, I definitely agree that we should run tests in node where we can and only run them in the browser where we have to, and I think your work in this direction is really exciting.

The VisualEditor code history is littered with these. (As is OOjs's more aggressive use of SauceLabs.) Not that that changes the necessity in general, but…

Are you talking about QUnit or Selenium here? I wasn't aware that we had CI infrastructure for running QUnit tests in other web browsers and am not aware of any QUnit/SauceLab integration. Can you point to some example regressions they have caught?

Jdlrobson updated the task description. (Show Details)Feb 14 2019, 8:29 PM

The VisualEditor code history is littered with these. (As is OOjs's more aggressive use of SauceLabs.) Not that that changes the necessity in general, but…

Are you talking about QUnit or Selenium here? I wasn't aware that we had CI infrastructure for running QUnit tests in other web browsers and am not aware of any QUnit/SauceLab integration. Can you point to some example regressions they have caught?

QUnit, via node.

For VisualEditor, it's configured at https://gerrit.wikimedia.org/r/plugins/gitiles/VisualEditor/VisualEditor/+/master/Gruntfile.js#386 (though I believe we're only running Chrome not Firefox right now).

For OOjs, it's configured at https://gerrit.wikimedia.org/r/plugins/gitiles/oojs/core/+/master/Gruntfile.js#94

Ed, Timo, and Roan are the experts of our epic struggles with browser compatibility.

daniel added a subscriber: daniel.Feb 19 2019, 1:51 PM

I think this is a good direction, but I'm skeptical that we can, without a lot more effort, get rid of browser QUnit tests. A number of them depend on the MediaWiki environment (or literally on being in a browser) and it can be difficult or cumbersome to mock the MW environment and/or the browser environment (including the DOM) faithfully enough to make these tests work.

Without looking at the code, this seems to indicate that the code itself is poorly isolated, and could benefit from refactoring. In other words: this may not be a reason to stick with the old test system, but rather a reason to refactor the code to make it more testable (by reducing dependencies, which also makes it more reusable, more stable, and easier to understand).

daniel moved this task from Inbox to Under discussion on the TechCom-RFC board.Feb 19 2019, 1:52 PM

I think this is a good direction, but I'm skeptical that we can, without a lot more effort, get rid of browser QUnit tests. A number of them depend on the MediaWiki environment (or literally on being in a browser) and it can be difficult or cumbersome to mock the MW environment and/or the browser environment (including the DOM) faithfully enough to make these tests work.

Without looking at the code, this seems to indicate that the code itself is poorly isolated, and could benefit from refactoring. In other words: this may not be a reason to stick with the old test system, but rather a reason to refactor the code to make it more testable (by reducing dependencies, which also makes it more reusable, more stable, and easier to understand).

Indeed. In fact while migrating one file to the new pattern in the example I made the test better by identifying these implied (but not specified) dependencies on MediaWiki and made them explicit. I think not only would this effort make these existing tests better but lead to sensible refactoring and the introduction of new tests for things that previously were not testable.

FWIW the stubbing in https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/487168/9/tests/qunit/suites/resources/mediawiki/mediawiki.Title.test.js of mw.util, mw.config minimises the chances of false positives of these tests due to differently configured MediaWiki instance.

I think this is valuable work, and something I'm keen to lead on if this direction is accepted.

Ed, Timo, and Roan are the experts of our epic struggles with browser compatibility.

Once all tests are headless there is nothing stopping us from running in a browser the same way as we currently do IMO. We'd likely need to compile the JS first before sending to the browser (maybe as HTML file with inline script), but that seems doable.

Jdlrobson moved this task from Next up to Blocked on the User-Jdlrobson board.Feb 28 2019, 8:59 PM

TechCom is hosting a discussion of this on March 20 2:45pm PST (21:45 UTC, 21:45 CET) in #wikimedia-office

Tgr added a subscriber: Tgr.Mar 11 2019, 7:50 AM

Selenium tests are slow, fragile, hard to maintain and hard to understand. Using it for integration tests would just make testing harder.

Rewriting most of our tests for an isolated, node-based test runner would definitely be a big improvement (ideally you want a "test pyramid" with lots of unit tests, less integration tests and few end-to-end tests; in MediaWiki frontend what we call unit tests are mostly integration tests, not properly isolated from each other or the environment, and that should be fixed). IMO it would be better to focus on that instead of removing things that still have some use (integration tests are often needed to test complex interplay with MediaWiki configuration or DOM, or between multiple classes, or for regression tests for browser bugs).

Jdlrobson updated the task description. (Show Details)Mar 14 2019, 3:19 PM

There is additional relevant background presented in https://phabricator.wikimedia.org/phame/post/view/96/fast_and_isolated_js_unit_tests/.

Anecdotally, I don't think any contributors to Popups have longed for Special:JavaScriptTests. I too am personally unaware of any bugs identified that would have occurred in Special:JavaScriptTests but not headless Node.js.

Integration testing is possible in headless Node.js QUnit tests. We test production sources for jQuery and OOUI in MobileFrontend using jsdom. For me, this RFC is more about providing a fast, isolated, and consistent environment for testing but we can use it however we like.

Selenium tests are slow, fragile, hard to maintain and hard to understand. Using it for integration tests would just make testing harder.

+1.

Once all tests are headless there is nothing stopping us from running in a browser the same way as we currently do IMO. We'd likely need to compile the JS first before sending to the browser (maybe as HTML file with inline script), but that seems doable.

We would want to be able to run tests in the browser where they are easy to debug. As James mentioned earlier we sometimes need to run the tests in different browsers to test different behaviour. If the proposal involes adding build step then I feel that would also make development slower.

I also think we need to clarify that we really have no strict definition for the difference between a unit test and an integration test. If I test the construction of a complex object, like a VisualEditor data-model document, is that a unit test because it is one method, or an integration test because it triggers thousands of method calls in hundreds of classes?

The main difference between Selenium and QUnit is Selenium's ability to generate real user inputs, not to sequence together multiple functional tests into an integration test.

This ability comes at a performance and complexity cost and as a result we would avoid it unless we feel we need to test real user inputs, or a more realistic complete environment.

For me the real question here is how we can mark some of our existing tests as being self-contained unit tests, such that other developers can be confident they don't need to be run when code external to it is changed. I think that question is separate from the question of which testing framework is used, or how it is run.

To summarise what I'm hearing:

  • Unlike we'll get rid of all browser QUnit tests.
  • Selenium is slow
  • Easier to debug certain problems in browser

Proposal:
I suspect we can have the best of both worlds and maintain two places for unit tests with different benefits.
"Let's stop using QUnit as a mechanism for integration tests" => "Let's reduce the amount of QUnit integration tests"

Right now, I'm put off writing tests that run in Special:JavaScript/qunit as they are:

  • slow
  • not isolated (by design) so there is a risk they can and do impact other teams
  • require paperwork to be registered
  • allow me to write bad tests which assume certain environment (e.g. English language, certain extensions setup)
  • don't give me code coverage reports

I think having two ways of running unit tests would be excellent:

  1. headless using Node.js: designed for speed and developer productivity
  2. Via Special:JavaScript/Qunit - Testing integrating with core/other extensions would be useful.

I am already aware of teams interested in this style of testing for their own extensions, so would like to help standardise an approach and conventions to testing in this fashion that makes life easier.

In the IRC meeting tomorrow, I hope to get some clarity on whether this is a sensible direction, what problems there are with this approach (if any) and how we can make this reusable to other extensions (would it make sense to have a single npm module for testing that includes dependencies for Selenium, grunt etc)

If approved I am keen to own this, by setting up the infrastructure, auditing the existing tests in core and separating out things that would be better classed as unit tests.

Just wanted to chime in before today's conversation. As a new developer at WMF with a background in front-end engineering, I found the Special:JavascriptTest approach to unit testing to be pretty cumbersome. Running tests in the browser was slow and resource-intensive, and inevitably some tests from other extensions (which I was not concerned with in local development) would come up as failures and worsen the signal-to-noise ratio. Finally, the traditional approach in ResourceLoader of concatenating all the JS dependencies and relying on objects in the global scope made it hard to test components in isolation, figure out what to mock, and reason about the code generally.

I'm currently working on a patch to refactor some code in the WikibaseMediaInfo extension to use require and module.exports statements since ResourceLoader now supports this syntax. This solves multiple problems in my view: code becomes easier to test in a headless Node environment (and can still be run in CI), and the ability to use a proper module system on the front-end makes JS development easier. I think that this is the approach we will use for front-end unit tests going forward in the Multimedia team.

There are some unsolved problems here though – one of the biggest being the possibility of a divergence in what versions of front-end libraries are used in testing vs production. For example, I need to pull in OOJS and OOUI to test some components, but there is no way to enforce parity between the versions declared as devDependencies in package.json and what gets delivered in the browser by ResourceLoader.

Jdlrobson renamed this task from RFC: Let's stop using QUnit as a mechanism for integration tests to RFC: Let's stop running QUnit unit tests as integration tests.Mar 27 2019, 7:48 PM
Jdlrobson updated the task description. (Show Details)
Jdlrobson updated the task description. (Show Details)Mar 27 2019, 7:51 PM

Change 487168 had a related patch set uploaded (by Jdlrobson; owner: Jdlrobson):
[mediawiki/core@master] WIP/POC: Headless Qunit tests (oh my!)

https://gerrit.wikimedia.org/r/487168

Here is my perspective as one of the developers of the AdvancedSearch extension, which heavily uses OOUI: What is an "integration test"?

  1. Running QUnit tests in this way is not catching real integration errors. Most integration errors are being uncovered these days using Selenium

Our QUnit tests don't rely on MediaWiki being present (no API calls, no calls to mw.*), but they need a DOM being present because they mostly test features we add to existing OOUI classes by creating subclasses. In the past, those tests caught integration errors ("integrating" OOUI) for us and remain an important asset in our testing strategy.

Running our tests from the command line is a goal we have in mind, Change 487168 is a good example, that we might follow. But still, we need to test our "integration" with OOUI, and if possible, be notified of breaking changes early and in an automated fashion. If moving the integration tests out of QUnit means moving them to Selenium, then I'm not in favor if this RFC as is, because that would make the development experience much worse for us - Selenium tests run slower and are more brittle, not to speak of the additional work of converting all our tests. The "Test pyramid" @Tgr mentioned would be turned on its head.

Hi @gabriel-wmde – just wanted to say that writing headless QUnit tests that can incorporate OOUI and access a (virtual) DOM are definitely possible. In the MediaInfo extension we recently took the first step towards writing these kinds of tests. I'm hoping to add a lot more headless unit tests in the near future to do the kinds of things you are talking about – ensure that UI components have the right classes, add/remove necessary elements from the DOM when a method is called, etc.

These tests are still much faster than Selenium-based ones (running in the command line is actually much faster/nicer than relying on Special:JavascriptTest in my opinion). Even though a virtual DOM is being used, I'd still consider these tests to be "unit" tests – the unit we care about is a particular UI element which includes DOM + JS methods.

You can see an example of those tests here. The OOUI library was recently updated to support being used via require in a Node environment, and you can see where it is being loaded into scope for the test suite. The jsdom library is also used here.

Happy to help answer questions if you are interested in trying to build a similar approach in AdvancedSearch.

As mentioned towards the end of our IRC discussion (Log: https://tools.wmflabs.org/meetbot/wikimedia-office/2019/wikimedia-office.2019-03-20-21.45.log.html) you needn't use Node + virtual DOM, as running Chrome headless these days is just as quick. In fact this is how OOUI itself runs tests.

If moving the integration tests out of QUnit means moving them to Selenium, then I'm not in favor if this RFC as is,

The proposal is to keep the existing setup, but migrate tests which are not actually needed as integration tests to a quicker isolated build. I'm not proposing you move all those tests these to Selenium, but I'm suggesting it would be useful to audit those tests and work out which ones really are useful as integration tests, resulting in a much slimmer set of tests that are run across the MediaWiki universe.

I looked into using Karma to run tests with the existing MobileFrontend setup for people that want to have test support in real browsers. We need to use Karma 3.1.4 (4 won't work with our current node 6). POC is here: POC: Show how Karma can be used to run node-qunit tests. I don't think it's something that my team would use but it shows that if running unit tests in real browsers is important to you it can be done.

Using webpack and a build step it's really trivial to get these tests running inside the browser via Karma (MobileFrontend already supports Special:JavaScript/qunit for development purposes) - you just need to pay more attention to fleshing out the MediaWiki+OOjs+JQuery environment and required stubs (something that would be made much easier if all our code was packaged on npm). The build step can be run inside npm test (so doesn't suffer from the problems of T199004)

I suspect you can avoid using webpack and the webpack bundle by using the requirejs plugin for Karma but I didn't explore that.

From my perspective, breaking out tests from Special:JavaScript/qunit/plain that are actually unit tests (not intentional unit tests being used for integration) would lead to faster, more modular and reusable code as well as provide more isolated tests. It also compliment the new packageFiles ResourceLoader construct as it encourages packaging up existing MediaWiki modules in such a way they can be published on NPM.

I'd love to see some kind of standard usage of this in core and encouragement to this style of development.

For what it's worth one of our projects uses requirejs and karma (analytics-dashiki), so if anyone wants to try that out I can show them how to make it work.

If moving the integration tests out of QUnit means moving them to Selenium, then I'm not in favor if this RFC as is,

The proposal is to keep the existing setup, but migrate tests which are not actually needed as integration tests to a quicker isolated build.

How would this be organized in the source? Would there be separate locations for unit tests vs integration tests? This idea has been around a while for phpunit tests, and has gained fresh attention recently: T87781: Split mediawiki tests into unit and integration tests

mobrovac added a subscriber: mobrovac.

Moving to backlog until @Jdlrobson is back.

If moving the integration tests out of QUnit means moving them to Selenium, then I'm not in favor if this RFC as is,

The proposal is to keep the existing setup, but migrate tests which are not actually needed as integration tests to a quicker isolated build.

How would this be organized in the source? Would there be separate locations for unit tests vs integration tests? This idea has been around a while for phpunit tests, and has gained fresh attention recently: T87781: Split mediawiki tests into unit and integration tests

I'm back.I was imagining there would be three folders - node-qunit (tests run against Node.js) and browser, qunit (the status quo). The tests in node-qunit would be run using Node.js and would be expected to work on a vanilla MediaWiki install so cannot make any assumptions about language or installed extensions (stubbing the contracts of these where necessary). The qunit folder would be reserved for tests where it is useful to test integration of those contracts. Does that help?

@Jdlrobson thanks for the clarification, Jon!

We put this ticket in the freezer until you got back. Now, how do you want to procede with it? You can just put it back in "under discussion" (and maybe solicit input on wikitech-l), but you could also request an IRC meeting if you think that would be helpful (move it to the "request IRC meeting" column), or propose for it to go straight to Last Call (we don't have a column for that on the RFC board - best just comment and move it to the inbox, then we'll look at it in the next meeting).

Jdlrobson updated the task description. (Show Details)Sep 25 2019, 6:46 PM
Jdlrobson moved this task from Backlog to Last Call on the TechCom-RFC board.

Hey @daniel sorry for the delay getting back to you.

I'd love to port all mediawiki core's unit tests to headless unit tests beginning with https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/487168/
I think this is an important change, but for that to happen I need "permission" and someone to help me review these changes. I am happy to write all the patches.
I will not remove Special:JavaScript/qunit and will reserve this for any tests which are actually integration tests as I encounter them. Please review the revised implementation steps to understand what I plan to do.

An output of this work would be code coverage reports telling us the code coverage of JS in core and quicker builds in zuul given the fact mediawiki core's unit tests will not be run by extensions.

MobileFrontend has succeeded in doing the above:
https://doc.wikimedia.org/MobileFrontend/master/js/lcov-report/

Is anyone strongly opposed to me doing this (and able to explain why) and seeking assistance (here/via wikitech-l) with code review?

Krinkle moved this task from Last Call to Inbox on the TechCom-RFC board.EditedSep 25 2019, 7:46 PM
Krinkle added a subscriber: Krinkle.

Last Call is a stage initiated by TechCom (process). I will bring it up in today's meeting.

Krinkle renamed this task from RFC: Let's stop running QUnit unit tests as integration tests to RFC: Reconsider how we run QUnit unit tests.Sep 25 2019, 8:36 PM
daniel moved this task from Inbox to Under discussion on the TechCom-RFC board.Sep 26 2019, 8:13 AM

Brief discussion at the TechCom meeting indicates that there are unresolved issues, in particular wrt the fact that qunit tests are being used in different ways be different extensions and core. @Krinkle can probably provide more detail. It's unfortunate that @Milimetric is currently unavailable, since he seemed to have thoughts on this as well.

Moving to under discussion for now.

I'm back and getting up to speed. I'd need to hear what the unresolved issues are, so we can collaborate on a way forward. The patch that Jon proposed for core seemed like an improvement to me, for example. And it wouldn't affect how other extensions use qunit tests, but it seems to improve testing in general. So yeah, let's brainstorm together.

Any updates here?

, in particular wrt the fact that qunit tests are being used in different ways be different extensions and core

The extensions I am aware of that are doing different things to core are using headless unit tests. Headless QUnit is being used in Popups, WikibaseMediaInfo and MobileFrontend. This change would provide us with a standard library for doing that based on QUnit with the option of also running them in Special:JavaScriptTest. In MobileFrontend we have the best of both worlds (we can debug via command line or via browser).

One note in general, as an issue I stumbled upon recently. We should consider the cost of maintaining such tests.

If your frontend is an isolated codebase not depending on lots of things, I think it's fine to use qunit in CLI mode but if you're depending on jqueryUI, lots of browser-based libraries, etc. You're going to end up having too many hacks to make it work rendering the whole thing close to unmaintainable.

As an example, these are WikibaseMediaInfo hacks to make it work: helpers.js and hooks.js. As part of RL Module Terminators Trailblazing I tried to change small amount of code in WikibaseMediaInfo, the code works fine locally, the refactor doesn't change anything major, tests don't need to change, everything is straightforward but still most of qunit tests fail because the hacks to make it work in CLI mode need to change and two days of work to change the hacks didn't get me anywhere.

If your frontend is an isolated codebase not depending on lots of things, I think it's fine to use qunit in CLI mode but if you're depending on jqueryUI, lots of browser-based libraries, etc. You're going to end up having too many hacks to make it work rendering the whole thing close to unmaintainable.

As I highlighted before some things are actually integration tests and I think if you are adding lots of browser-based-libraries you may be doing it wrong. Those tests depending on jquery UI should be integration tests. If they are not integration tests, they may benefit from some refactoring - for example a function that calls $.dialog may be better refactored to be a function that takes a method that shows that dialog

e.g.

function before() {
  $.dialog({});
}

becomes:

function after( showDialog ) {
  showDialog();
}

Change 558197 had a related patch set uploaded (by Jdlrobson; owner: Jdlrobson):
[mediawiki/core@master] Improve mediawiki.Title unit tests

https://gerrit.wikimedia.org/r/558197

Any updates here? I'd really like some guidance and to understand why we wouldn't do this. Keeping the patch rebased is proving troublesome.

Since the rebases were causing me problems, I've started to pull out the test modifications into https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/558197 Improve mediawiki.Title unit tests (which currently fails in jenkins but I'll have fixed shortly) - I think these should be uncontroversial and are a massive improvement to the current tests and should be considered separately and swiftly if more time is needed to think through a move from a mediawiki based test system to a community supported Node test system.

daniel moved this task from Under discussion to Inbox on the TechCom-RFC board.Dec 16 2019, 9:56 PM

Any updates here? I'd really like some guidance and to understand why we wouldn't do this. Keeping the patch rebased is proving troublesome.

Moving this to the inbox for review by TechCom. I take it you want this to go on Last Call?
Once it's approved, you can tag CPT for review.

Krinkle claimed this task.Wed, Jan 8, 9:25 PM

Hey, hereby the results of my analysis and research. As both our teams and departments had other goals past year, this took the back seat for while but I've now finished this. It's been a long time coming. The bulk of research took place in November/December.

Objectives and benefits to gain

Looking at the original summary, proposal, and current implementation in the MobileFrontend and Popups repositories, I understand that the key points are:

  1. Easy and quick CLI access with ability to run current project's tests only (ideally even more granular).
  2. Generate code coverage reports.
  3. Ability to use ES6 in tests.
  4. Adopt CommonJS module.exports in source code.
Analysis

The main caveat I've seen so far is that by creating an alternate way of running a subset of QUnit tests, we solve the above only for that subset of tests. I set out last month to better quantify what might be causing some of the limitations we see today, and what it would take to achieve these objectives more generally.

CLI access. Our primary QUnit stack has actually always had a headless CLI runner, as used by Jenkins. It currently uses Karma from npm with Headless Chromium. Interestingly, Headless Chromium launches as fast as a Node.js process (details at P10229).

I've cleaned up the documentation on mediawiki.org to make this easier to use locally with just three steps.

In terms of speed, I was unable to find any significant speed difference between Node.js and Chromium given similar tasks. If anything, I kept finding that Chromium is slightly faster. For example, the OOjs suite (containing 57 tests) completes with Karma and Headless Chrome in 5 seconds and with node-qunit in 7 seconds (details at P10229). The differences are marginal, and mainly relate to the one-time cost of ~300ms to start the process, which is lost in the noise after you load code and run tests.

Looking at MediaWiki core in its entirety (441 tests), these currently take 5 seconds to complete (test report in console) (Jenkins job), with the overall job (incl. shutting down of background process) taking 8 seconds.

For me locally, with MultimediaViewer and a few other extensions installed, MW runs 691 QUnit tests taking about 25s from the CLI (Jenkins workers run mostly from RAM, and are outpacing my MacBook Pro...)

Single test suite / Filtering. This isn't a very well known feature, but the current test runner offers a filter that is also accessible by query parameter, e.g. ?module=mediawiki.util or ?filter=mmv.ui. This is also exposed in the HTML interface, but @Catrope pointed out to me this has the downside of only appearing after first running the full unfiltered test suite, which is annoying when you have lots of extensions installed. We need to do better there. For example, we can update the UI so that the landing page asks by default what extensions or test suites you want to run. Same for the CLI. We can add a little parameter that is populated by the extension configuration and lets you run just one of them. E.g. npm run qunit --suite MobileFrontend,Minerva or something like that.

With the new QUnitTestModule property in extension.json, this will be even easier, as it allows core to know which test suites belong to a given skin or extension (so that you could automatically run tests for "the current extension" only, from an extension or skin directory; without having to know or specify some kind of prefix).

Code coverage. While our Gruntfile is overdue for some housekeeping, we are essentially using a standard karma-qunit setup, which means other Karma features are available also. One of these is the karma-coverage plugin to generate a code coverage report via Instanbul-js/NYC. (The same library as used in Jon's proof of concept).

Instanbul-js works by instrumenting all source files with extra statements that increment counters in a large object, exported as variable __coverage__, this is then converted to a coverage report in HTML and Clover formats.

We already have experience using this plugin in most of our non-MediaWiki repos. For example, VisualEditor, UnicodeJS, OOUI etc.

I tried this out by running nyc on a subdirectory and then enabling karma-coverage (which looks for that object and creates the report after the tests finished running). Seems to work:

ES6 and module.exports. I think these are somewhat orthogonal to the current RFC. Adoption of module.exports has already begun in MediaWiki and isn't blocked by this RFC. We can and absolutely should continue to do so. Using ES6 in test code is also already possible today. The versions of Node, Chromium and Firefox we use locally and in CI all support similar levels of ES6. It is up to individual teams to decide whether they want to configure ESLint different for the test directory, and whether they find it beneficial to allow different languages there. Whether we run tests in V8 via Node or via Headless Chrome doesn't change that.

Hidden costs: I gave node-qunit a try in the Example extension. This led me to find numerous costs with the current proposal that I think we'd need to be acknowledge or address:

  1. Using Node.js would need a DOM polyfil like "jsdom" - a browser simulator with no common heritage from Firefox, Chromium or WebKit. This means our frontend would effectively have to support another browser with its own unique quirks and lackings. It would narrow the subset we can use, and adds an additional source of bugs that may block development.
  2. Running in a standalone process means no access to dependencies. A common unit testing practice is to construct the class under test manually and inject any/all dependencies explicitly in the test suite. In PHPUnit, the mock builder can create mock versions of classes. SinonJS offers the same ability. However, this doesn't work without access to the source. As such, one is required to maintain a copy of all dependencies. This isn't obvious in the proof of concept commit as it fetches wikimedia/mw-node-qunit from GitHub. I am unsure how this would scale to wider MediaWiki development, especially beyond MW core and across extensions. It is already showing inconsistencies where some methods are stubs and others verbatim copies.
  3. Testing our code without letting a supported run-time load the code (ResourceLoader) and without letting a supported JS environment execute it (a web browser), decreases the value of our tests. When we test our PHP code, we execute it in PHP 7.2 and use Composer as autoloader. Involving a supported engine and loader does not make it an integration test.
  4. There is a cost to having to support Node.js (and its require), and potentially Browserify/Webpack. For code that has no future as server-side utility, this would be an on-going cost without benefit.
  5. Having two frontend unit test pipelines adds long-term maintenance (who looks after it?), requires more docs and education for new contributors and staff (plus inevitable confusion over which is which, how they're different), and reduces value of experience with either pipeline due to not applying to the other (how to add tests, how to set up, how to debug, etc.)
  6. Debugging complexity. T212521#5192891 proposes debugging in in the browser but normally running Node, and introducing a layer of obfuscation (Webpack) to simulate Node inside the browser. This seems unsustainable to me, especially on top of a Node stack that was already simulating the browser DOM. However, I believe in the latest iteration of the proof of concept, the tests would run fine in the browser without Webpack. Though, that still leaves a major context switch from CLI/Node.js/jsdom to GUI/Chrome/Blink/RL. I expect that developers will either start out by setting breakpoints for Node CLI and never leave that context, or switch-over and bookmark the URL and also run tests that way when not debugging.
In conclusion

I think it would be beneficial to all stakeholders involved if we solve these problems for both unit-level tests and integration-level tests (end-to-end browser tests are something else, and should remain small in numbers, [1], [2], [3], [4])

There is work to be done to make QUnit easier to run from the command-line. We need to do that work regardless of how we decide on this RFC given its limited scope. The above implies what that work entails, but I'll also file a subtask about that in coming weeks where we can divide and track that work. For example, we could disable minification by default in test context.

The main reason some tests are slow today (especially on localhost) is not due to how or where they execute. It is due to what they do. For example, there is one MW QUnit test in particular "mediawiki.jqueryMsg.test" (regretfully written by yours truly) that often takes 5-10 seconds. It is technically a unit test, but it is slow due to lazily fetching its test fixtures from api.php and load.php (not a mockable AJAX request). These and other poorly-written test are way overdue for a refactoring with modern test disciplines applied, in line with the direction that Jon's proof of concept sets forth.

I think that this upcoming cleansing of tests and general infra improvements will likely resolve the problem statement. Thus no longer requiring a secondary test pipeline to workaround it.

Krinkle moved this task from Inbox to Under discussion on the TechCom-RFC board.Tue, Jan 21, 12:05 AM