Page MenuHomePhabricator

Establish a standard way of overriding service wiring for browser tests
Open, Needs TriagePublic

Description

Some services need to be mocked for browser tests (e.g. because they give nondeterministic results or rely on user-generated content). There should be a standard way of doing that, at least for Wikimedia Jenkins tests.

Event Timeline

Tgr created this task.May 25 2020, 11:06 AM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMay 25 2020, 11:06 AM

Just throwing this idea out here: the default service wiring is set in $wgServiceWiringFiles in DefaultSettings.php; A LocalSettings.php file could override that value of that variable by replacing it with a service wiring file that looks something like this:

$wiring = require "$IP/includes/Servicewiring.php";

$wiring['service'] = function ( MediaWikiServices $services ) {
    // An override here.
}

return $wiring;

Then just set $wgServiceWiringFiles = [ 'path/to/test/wiring.php' ]; in LocalSettings.php.

This has the advantage of not requiring changes to core, and the wiring file can just be stored in the test repo.

Tgr added a comment.May 25 2020, 12:56 PM

Yeah, that came up as an option (this task is the continuation of discussions from here and here) but especially for extensions having that code in the extension repo is probably the lesser evil (easier to ensure it stays current, for example).

There's also a MediaWikiServices hook (with the added benefit of being able to decorate the existing service instead of completely replacing it) although I'm not sure if T153256 blocks that, but it raises the same question: where do you store the overrides?

Krinkle added a subscriber: Krinkle.EditedMay 25 2020, 4:53 PM

Some services need to be mocked for browser tests (e.g. because they give nondeterministic results or rely on user-generated content).

Can you provide a real example of a non-deterministic test? Why is it non-deterministic?

And what user-generated content does it refer to? Our tests run against a clean installs where the only content is content prepared by our own tests.

In general I'm skeptical of this direction as it imho suggests deeper problems are avoided, whilst reducing the value of the test itself, and making it significantly more difficult to run the tests.

Tgr added a comment.May 25 2020, 5:31 PM

GrowthExperiments has a recommendation API where result ordering is intentionally randomized. In an browser test we'd want to test result navigation but don't care about testing the randomization (if we want to test it at all, easy to do in a unit test) and not knowing the order of results beforehand is unhelpful.

Preparing user-generated content via simulated edits is a chore and a major overhead for test times. Probably introduces all kinds of fringe errors too, since you are now depending on the job queue and search index updates and who knows what else. Having an alternative service implementation which returns a fixture is a much saner approach (arguably that's not really an E2E test anymore, so browser testing or integration testing would probably be a more accurate task title).

There is also the issue of testing UI which gets some of its data from services which we really don't want to set up in the test environment (AQS, for example).

For API testing, maybe we could add some utility methods to the api-testing npm package. In a before() statement or within the test case, you could do something like utils.setupServiceWiring( 'filename' ) and that would append to LocalSettings.php; on teardown we'd remove this appended code. The Selenium tests could do something similar.

awight added a subscriber: awight.May 29 2020, 1:17 PM

GrowthExperiments has a recommendation API where result ordering is intentionally randomized. In an browser test we'd want to test result navigation but don't care about testing the randomization (if we want to test it at all, easy to do in a unit test) and not knowing the order of results beforehand is unhelpful.

If the results can be random, then that means the feature fundementally is not intending to ensure a specific page to be recommended at any point in time. Why does the test care which page is recommended?

Preparing user-generated content via simulated edits is a chore and a major overhead for test times. Probably introduces all kinds of fringe errors too, since you are now depending on the job queue and search index updates and who knows what else. […]

I don't know about this service or api-testing in particular, but for end-to-end testing in general we start with a fresh wiki each CI run. There exists only the pages your test creates, and maybe the pages other tests have or will create. We generally create these via a quick API call. I have proposed in the past to instead do this via a CLI invocation instead (e.g. maintenance/prepareFixture.php stuff.json where stuff.json might describe users, content, revisions etc whatever else it wants to ensure. These could then also be torn down again easily, and thus get rid of all the Math-random calls we currently clutter around.

Both this idea as well as the idea of injecting settings or other PHP code would break support for running tests without server access to the MW install, which I think is fine, but did have some opposition/doubt in the past.

Regarding jobs running, there is a shortcut in wdio-mediawiki to run pending jobs. There are tests for other features that needed this for the same reason.

awight added a comment.Jun 4 2020, 8:45 AM

Another use case that I've run into a few times recently is the need to test both configurations of a feature flag. As soon as we enable the new feature in browser tests, using configuration defaults in extension.json or DevelopmentSettings.php, we lose the ability to test the status quo behavior where the feature is disabled. Currently, we work around this limitation by setting cookies which backdoor-enable the feature configuration.

I'd like to also mention that we should keep test parallelism in mind. Any solution which simply sets something in LocalSettings or the database will affect all tests, so we need to isolate at the correct granularity to let suites each run with their own configuration, but without single-threading.

And what user-generated content does it refer to? Our tests run against a clean installs where the only content is content prepared by our own tests.

Take the GrowthExperiments Suggested Edits feature for example. If I want an end-to-end test that loads the homepage, clicks on a card, then follows the guidance workflow, currently I would need to either 1) set up elasticsearch in CI (a non starter, not possible with Quibble currently) or 2) use a remote wiki API to provide the data for the suggested edits feature, which introduces a whole set of other problems we don't want in our tests.

In our extension, we we have StaticConfigurationLoader and StaticTaskSuggester services; if we could easily specify that we want default services overridden with those static services when running any test (or particular tests) in selenium or api-testing frameworks, then we have a nice, deterministic setup that can be used for our browser or API tests.

Tgr renamed this task from Establish a standard way of overriding service wiring for end-to-end tests to Establish a standard way of overriding service wiring for browser tests.Jun 4 2020, 10:43 AM
Tgr added a comment.Jun 4 2020, 11:02 AM

If the results can be random, then that means the feature fundementally is not intending to ensure a specific page to be recommended at any point in time. Why does the test care which page is recommended?

The API randomizes result ordering (in some situations). The frontend code does not care about sorting choices made by the API; it receives a set of objects, and needs to display them in the exact same order. A browser test that is intended to test the frontend behavior (clicking on navigation arrows and such) can the most easily do that by replacing the API result set with a fixture.

I don't know about this service or api-testing in particular, but for end-to-end testing in general we start with a fresh wiki each CI run. There exists only the pages your test creates, and maybe the pages other tests have or will create.

That makes already long-running and fragile browser tests to be even longer-running and more fragile. Sometimes unreasonably so (consider something like category page pagination - would you want to test that by setting up hundreds of wiki pages?).

Both this idea as well as the idea of injecting settings or other PHP code would break support for running tests without server access to the MW install, which I think is fine, but did have some opposition/doubt in the past.

I also think it is fine. The only reason to run tests without server access is to run smoke tests in production / staging. That has its value, and it makes sense for some CI tests to double as smoke tests, but I don't think it makes sense to require all of them to do so.

Regarding jobs running, there is a shortcut in wdio-mediawiki to run pending jobs. There are tests for other features that needed this for the same reason.

We can always add enough epicycles to make our current system work, sure. But if we need to do that all the time, at some point we should stop and reconsider if it's still the right system to work with.

I think being able to browser test larger code components (in the Growth use case, the frontend code and backend view layer involved in presenting a special page) without the test depending on all other components which would be normally involved (in the Growth use case, MediaWiki's content lookup logic, search engine, user preferences, and probably a number of other things) is a reasonable expectation.