This task is to discuss and determine the best approach for in-service (implemented within the service itself) testing for AQS 2.0. A secondary (but perhaps more important in the long run) goal is to establish testing patterns for future similar services. This is likely to spawn related implementation tasks.
Background
AQS 2.0 provides public-facing access to certain WMF metrics data via what is effectively a federated API implemented in six Go services. It replaces the original AQS, which was implemented via a fork of RESTBase and which provided access to the same data via an API of virtually identical contract.
There is a lot of "new" involved with AQS 2.0. This includes:
- Go is not in widespread use at WMF at this time (Kask is the primary production example of current usage).
- deployment of services to our production k8s environment is still somewhat formative.
- while AQS 2.0 may not be the pinnacle of a microservices approach, it is at least more that direction than its predecessor.
- we're also trying to build it off our relatively untried Go service scaffolding and related servicelib.
- we're trying out WMF's GitLab installation
- we're collecting common utility functions in a separate Go package housed in its own repository
- we are using Docker-based "testing" environments to stand in for production datastores during development and local testing
- in addition to tests implemented within the service, we have an external testing framework
A result of all this is that we are encountering a larger-than-average number of questions and surprises as the project proceeds. The most recent of these regards in-service testing, especially as it applies to our CI systems.
Context (testing specific)
Here is a bit more context specifically around testing (points are numbered for ease of reference in case someone wants to mention them in a followup comment):
- we have both local/in-service testing, implemented in Go within the service itself, as well as a separate testing framework that performs integration testing by invoking the services externally. This task is really only about the first of those: the in-service tests. The external tests are covered in tasks like T317790: <Spike> AQS 2.0 Testing Plan.
- our current testing approach, as outlined about a year and a half ago (yikes!) in T288160: Development and test environments for AQS 2.0 Cassandra services, is to spin up Docker/Cassandra containers with canned data. The original task has a preference for hosted dev/staging environments, possibly ones that we could spin up in CI. But it anticipates that may not be available, and suggests local Docker Compose environments as an alternative. That's what we have done so far.
- while we've been calling these "unit tests", it was acknowledged early on that they were more akin to integration tests.
- because we have chosen to push as much common functionality as possible to the common "aqsassist" package, much of the unit testing responsibility sits there, rather than in the individual services. (And aqsassist does have actual unit tests.)
- it is correct from a theoretical perspective that these are not strictly unit tests. However, there is existing art for this sort of testing in the current production AQS code that we're replacing, and it seems to have served that codebase well. In fact, that's where we got the list of in-service tests to implement:T317720: AQS 2.0: Unique Devices: Implement Unit Tests T299735: AQS 2.0: Pageviews: Implement Integration Tests T317722: AQS 2.0: Mediarequests: Implement Unit Tests T317725: AQS 2.0: Geo Analytics: Implement Unit Tests T316849: Audit tests for Druid-based endpoints (these tasks include services in various stages of completion, so not all of this is implemented yet, and some services don't even exist)
- the current tests have been very helpful to developers during implementation
- because the goal of AQS 2.0 is compatibility, tests that execute the handlers at a relatively high level during CI would be useful in preventing regressions
Bigger picture
Assuming that AQS 2.0 is successful, it may establish patterns for similar Go-based services that use similar development, testing, and deployment practices. So it is worth exploring our options to find the best long-term solution, as it may benefit future work.
Issue
Now, on to the problem we're currently facing. Per discussion elsewhere (please correct me if I am misrepresenting this), our current challenge is:
- the current tests implemented within the AQS services require local Docker testing containers, which do not fit into our image building pipelines
- the current tests are more akin to integration tests than unit tests, in that they depend on an external data source
We could quickly satisfy #1 and have the pipeline succeed by "cheating" and having "make test" simply return "true". Or by implementing a small number of unit tests that don't provide much code coverage, but which also don't require testing containers, . Both those options are irresponsible, because then the pipeline won't be much help in protecting us against regressions. This situation is therefore a deployment blocker.
Solutioning Thoughts
With all that in mind, here are some thoughts related to solutions
- the current tests have proven very valuable during development and code review, and tests of similar scope have served the existing production service well. Eliminating them would therefore be inadvisable. But we could also add additional tests that are actual unit tests.
- could/should we arrange the service code differently to be more unit testable?
- could/should we have multiple levels or suites of tests within the service? For comparison, Mediawiki has both "unit" and "integration" tests within its codebase, all executed via the phpunit framework.
- is the current inability to spin up testing containers a fundamental limitation of our CI system, or simply its current status? Is it possible this could change?
- could the current standalone testing containers be refactored such that our current system could spin them up?
- is there a possibility of having hosted testing containers/environments?
- is there a possibility of testing against production data sources? The author of this task has a high level of discomfort with this, but ssh'ing into prod is recommended for the druid-based endpoints in the current production AQS. Doing this via CI seems at least slightly less evil. Then again, maybe doing it via CI might actually be *more* evil, given that it opens the possibility of anyone on the planet executing random queries against production datastores via a malicious push.
- does using an external data source compromise our testing in ways we are inherently uncomfortable with? Would mocking a data source be better or worse?
- does the anticipated move from gerrit to GitLab change anything for any of this?
Regarding mocking a data source vs using an external data source: mocking is probably more theoretically correct, and it is easy to say we should just do that. However, there are arguments against it:
- it makes our testing less representative of production, in that our mocking system will never accurately behave like the "real thing"
- it creates a lot more code to maintain - mocking two separate databases isn't trivial
- it raises the bar for the skillsets required. Mocking a system requires the coder to sufficiently understand both the thing being mocked, and the technology being used to do the mocking. An external system could be created by an individual familiar with the datastore (and containers for many datastore types, including the ones we use, are publicly available), thereby requiring the service developer to understand the datastore only from the outside-in perspective of creating queries and processing the response, not from the inside-out perspective of processing queries and creating responses.
- data in an external environment can be ingested into the testing container independently of modifications to the codebase. A mocked system with no external dependencies requires a change to the codebase to update testing data. (However, this could also be considered an advantage of mocking, as using an external environment means the data being tested against can change without warning)
- Mocked data accompanies the codebase on its entire journey, increasing the size of repos (and possibly even deployment images, depending on how we approach things).
- We already have containers. Mocking would be an additional development effort. (This is lower priority, as we would rather do things correctly even if it means more work. But it isn't completely ignorable either.)
Task Responsibility
The author is assigning this task to themselves, as driver. However, the purpose of this task is discussion and consensus, not for one person to make decisions. Please jump in with anything I've forgotten, correct any mistakes or misrepresentations I've made, etc. There's no way I typed that many words and got it all right...
Also, I'm subscribing a bunch of people who I think would be interested, but please add anyone I forgot.
Acceptance Criteria
This task is complete when a testing solution has been agreed on and any additional tasks necessary to realize that solution have been created.
Edit: fixed a bunch of typos and poor wording choices.