Page MenuHomePhabricator

<Spike> AQS 2.0 Testing Plan
Closed, ResolvedPublicSpike

Description

Background/Goal

Understand existing testing practices and prepare a comprehensive testing plan for AQS 2.0.

User stories
  • As an API Platform team member, I need to know what the testing plan is, so my work can ladder up to the testing plan process needs.
  • As a QAT for AQS 2.0, I need to create the testing requirements and process, so AQS 2.0 can be successfully prepared for deployment.
Considerations
  • What capabilities are critical vs nice-to-have?
  • What capabilities have the biggest impact?
  • What capabilities have the most dependencies?
  • How can we scope this work in a way that delivers incremental benefits?
Requirements
  • A brief description of what the use case is
  • Review existing testing methods
  • Analyze AQS 2.0 as a product
  • Design test strategy
  • Define test objectives
  • Define test criteria
  • Resource estimations for testing
  • Plan test environment
  • Bullet internal/external impacts of testing strategy
  • List of what this blocks (if anything) and why. Include examples.
  • Documents and links to existing artifacts, tickets, etc.
  • Open questions/additional areas to explore

Upon completion of the above:

  • Tech Lead & Engineering Manager review
    • Bullet list of what the related infrastructure capabilities are
    • Bullet list of potential/expected development/engineering impacts (both positive and negative)
    • Bullet list of potential/expected design impacts (both positive and negative)
    • Describe WHAT phases or chunks of work could be done and by WHO
    • List any dependencies we have on any tools, teams, etc.
    • Meeting set to review scope with Product Manager
  • Once scope completed and agreed to, next steps defined (ex: create Epic w/ subtasks)
Acceptance criteria
  • It is clear to see how the target capability impacts end-users
  • It is clear the new AQS 2.0 is compatible with the existing production AQS, so that clients/callers are unaffected when AQS 2.0 is released
  • It is clear how the target capability impacts WMF staff
  • Impact can be delivered incrementally, without having to wait months or to the end of a project to see impact
  • Non-technical audiences can understand why this work matters and how it impacts the community

Event Timeline

@Atieno no problem.

@BPirkle if its ok with you I can work with you on this. I will coordinate with you via slack. Thanks

@BPirkle if its ok with you I can work with you on this. I will coordinate with you via slack. Thanks

Sounds great, ping me when you're ready to chat.

EChukwukere-WMF changed the task status from Open to In Progress.Sep 26 2022, 9:33 PM

Test Plan Description

This is to test the functionality and End to End workflows of the AQS API endpoints. This will Involves functional and potentially integration testing thats determines functionality, reliability, and probably performance of the endpoints for AQS 2.0

Test framework is developed to drive the AQS 2.0 endpoint requests. With this framework, test cases writen will be in the follow categories:

  • Functional Tests:

    This is the assess specific function of the codebase through the endpoint request

    Expected output from functional tests:
    • Status code checks
    • response body type
    • Schema tests
  • End to End Tests:

    Should we need this test, this category of tests will assess the communication between APIs endpoints. It can otherwise be referred to as integration tests. It ensures the APIs are well connected and don’t cause defects in other endpoint modules

    Expected output from integration or e2e tests:
    • Ensure apt status code
    • check for the response body type
    • Schema tests
  • Negative tests:

    This tests assesses responses when invalid paramters are supplised to the request, the appropriate status codes should be returned

    Expected output from Negative tests are
    • Apt status code
    • Meaningful error messages
  • Performance Tests (if necessary, and will be further down the roadmap):

    Performance is validated by artificially simulating API calls , checking metrics that signals how well the APIs are responding under loads

    Expected output from performance tests are:
    • response times
    • Throughput
    • server conditons such as latency, Connect TIme
    • Jmeter can be used as performance ( this is subject to change)

Test Environment:

The test environment can be split across a test/QA env and production. This can be subject to some discussion if needed else we can just have one environment where the tests are carried out

Framework Design:

Behaviour Driven Development (BDD) as well as Test Driven Development (TDD), using gherkin statement, along with python scripting language will be the framework and language of choice for testing. Both technologies are compatible with rich source of librabries specificially designed for API testing. The framework has been completed in T319314

Reference for BDD framework : Gherkin Reference - Cucumber Documentation (https://cucumber.io/docs/cucumber/)

Regressions:

Once Framework is setup and tests and contibually written and updated, these tests will be scheduled to run in Gitlab CI/CD configurations. Further research will be needed for this.

@VirginiaPoundstone I did make the comments above as to how I will approach the testing and have started the work in various Phab tasks. The Test framework has been created and other testing tasks are been implemented.

So we can consider this as done, Unless @BPirkle , you know of more things to add on here

@EChukwukere-WMF Thanks for outlining these objectives. It's great to see as we experiment with QA in general and learn how to better adopt QA best practices into our larger workflow. Especially on this emergent project which like many are loosely defined and requirements arise as part of doing the work.

I have ten follow-up questions:

  1. Is there an API/product analysis step in the testing process? If so, please add a description of how that is/will be done. What would you need in order to be able to do this?
  2. Is there a strategy for how to approach the testing? What are the dependencies you have in order for it to be as successful as you want it to be? What will the scope be for the testing (components & layers of the system will be tested)? Are there any risks or issues you foresee? What will be out of scope?
  3. What data will you use to test? Where will/did it come from?
  4. What are the criteria for the tests... how will we know they perform as expected? What are the targets for each? (Pass/fail, what status codes, what error messages, etc.)
  5. What will you need in order to do performance testing?
  6. When in the life cycle will regression testing happen?
  7. You will also do smoke tests, correct? Please outline when, how and why those would happen.
  8. How will you know if you need integration/End to End (e2e) testing or not?
  9. What are the test deliverables? Notes in an email? Critical and non-critical bug tickets? Testing reports?
  10. Have you done any documentation testing before? Could be nice to test our new openAPI specs too.

@BPirkle and @Atieno your review is next up on this.

Let's make sure we're using the same definitions. Here is my less-than-rigorous understanding:

unit tests: live within the code, and test it from an internal perspective
integration tests: live outside the code, and test end-to-end functionality
smoke tests: confirm that major functionality works
regression tests: confirm that changes don't break things that were previously working

I'd like to add one additional category of tests that is implied in the task description, but not explicitly called out: compatibility tests. Because we're replacing an existing production service in what is hopefully an invisible way to callers, we want tests to confirm this. This probably means a test suite that runs against both old and new services, in production, using real production data rather than canned test data. That's a somewhat special case compared to a brand new project, so it deserves a separate mention. Once the existing production service is retired, this category of tests will no longer make sense in the same way (because there will be nothing else to be compatible with). But the tests developed for this purpose will still be useful, in the sense that if code pushes change the behavior, then something is probably wrong. So there is a perspective in which we want each deployment to be compatible with the previous one.

We probably also need a clear distinction between tests that are the responsibility of the developers vs tests that are the responsibility of QA, and what the goals and limits of each are.

Specifically, we already have unit tests that live in the service code itself, are written in Go, and use the built-in Go testing capabilities. These tests are normally executed by the developer during the coding process (and are sometimes written in advance of the code if we're doing TDD for that ticket). They will hopefully also be executed automatically on every push once we get to gerrit. They require Go development skills, and also a strong understanding of the code itself. In my opinion, these should remain the responsibility of the developer, and should be created as part of each coding task that introduces new functionality. (In code review, I've already rejected an MR or two that didn't have unit tests...) These unit tests do a decent job of both smoke and regression testing from an inside-out perspective, but are limited in that they are unable to catch issues like production misconfiguration, hosting problems, prod database connection issues, etc.

One unknown (at least to me) with the unit tests is the testing containers. It may be a trivial matter for the gerrit pipeline (and its associated tooling) to spin up the Cassandra/Druid containers we're using now. Or it may not be. I've never done anything like that in our environment. But the unit tests depend on them, so if this problematic, we'll need to adjust. I'd be surprised if our current Docker Compose environments don't have to at least be tweaked.

We also already have an excellent and growing suite of what I'd call both integration and compatibility tests, written by Emeka, that test the services from the outside-in. They won't necessarily catch internal things (like what happens if you call a particular function in the code with the wrong parameters), because they don't have that level of visibility or control. But they will catch broader issues. I'm hoping we can execute these as part of the CI/CD pipeline as well, but I'm not sure how that would work with testing data. We have the advantage that the AQS services are read-only and execute against production datastores capable of handling tremendous load. So we may be able to test against actual production data (Should confirm this with SRE if we decide to actually try it, though...) However, I don't know if the environment the services will execute in during a push will allow this access or not. But if it is possible, it'd be great if Emeka's test suite (or something like it) ran on every push. For comparison, MediaWiki pushes can take the better part of an hour to finish automated testing (much of that is waiting in the queue, but still). AQS would likely be much faster.

Those are thoughts off the top of my head. Hopefully they're useful.

@VirginiaPoundstone Sure, I see your points and as we go along we can define and standardize some of our processes

My reponse:

Question 1:

  • When testing the API endpoints I run certain checks : Status code, content-json in the headers, Response body, Schema checks, returned data types in the values, Validations with negative scenarios ( Meaning when a user enters a wrong parameter, what kind of error message is to be displayed to the user or consumer of that endpoint)

Question 2:

  • The strategy I imployed here is to manually test the individual endpoints. Test the positive scenarios and negative scenarios. These scenarios includes mainly functional and integration testing of the endpoints. This is focused mainly at the API level and how it will be used. Also data testing will be checked to cross reference the data returned to the users

Question 3:

  • For the data testing, I can work with the devs to pull down some data for me so I can test this locally and then compare this with what is in production

Question 4:

  • Just like I mentioned in the first question/answer 1

Question 5:

  • For now I am yet to do any performance testing on the APIs, but this is something I can start later after I complete some of the functional testing and fixes. I have used a tool before called Jmeter to execute load and performance testing on endpoints

Question 6:

  • Good question, from my experience regression testing testing is to be run daily! now I do this locally which is not ideal of course. I am investigating using gitlab CI/CD to set this up. This needs to be ran regularly. I will create a spike for this in my backlog

Question 7:

  • Yes I will do smoke tests after we deploy, this is because I intend to write up exhaustive regression testing to give us 90-100% test coverage on each of the APIs such that the smoke testing will be minimal which might just be some spot checks

Question 8:

  • For now there will not be much End to End (E2E) testing. For E2E, to happen there has to be some communication between the endpoints so I verify how this works. According to the workflws I don't see this happening anywhere, I could be wrong. Will be more than happy to test if this exists.

Question 9:

  • Good Question, The test deliverables should come from a reporting system which can be sent by email to the team. This is another spike I will need to write up and have this sent. For now it will be simply by email which I can write up. This is because I am currently writing tests against the current production endpoints, as we begin to bring the AQS 2.0 endpoints online we surely want to check the help of all of them. Let me know what you think of this approach, or I can investigate how to start sending reports of the current tests in the prod environment

Question 10:

  • By documentation testing you mean : testing that ensures that documentation about how to use the system matches with what the system does ? I have done something similar before. I can test the docs written up and cross check that with the functionalities of the APIs in question.

Let me know if the above answers your questions. Thanks

@EChukwukere-WMF Thanks for the responses. Happy to hear we have paths towards API testing maturity.
Question 10 follow-up: exactly. test the docs and flag errors or inconsistencies in them.

@BPirkle and @Atieno should review as well, since I'm not in the weeds of the devops tools.

Good question, from my experience regression testing testing is to be run daily! now I do this locally which is not ideal of course. I am investigating using gitlab CI/CD to set this up. This needs to be ran regularly. I will create a spike for this in my backlog

I'm not aware of anywhere that we currently do this. That's not to say we don't - I'm finding out new things about our tech stack all the time. But the model I'm familiar with (mostly from MediaWiki) is that we run tests:

  • locally (everyone ideally does this before pushing. In practice, sometimes we get lazy and let gerrit do it, but we typically don't ask for review until gerrit tests pass)
  • code push
  • code merge (I believe these are a little more extensive the the tests that run on push)
  • deploy to production servers (aka scap). I believe these are mostly "canary" tests that make sure the deploy doesn't completely tank the appservers.

We then have the various monitoring/alerting systems that mostly go to SRE-ish folks, plus the Train Log Triage meeting where humans look through the error reports in Logstash. Release Engineering/SRE might know more about the various possibilities.

Disclaimer: I haven't done much production service work at WMF. The testing flow for services is probably at least somewhat different than for MediaWiki. In particular, I'm pretty sure that scap is MediaWiki-specific.

What are the test deliverables? Notes in an email? Critical and non-critical bug tickets? Testing reports?

Good Question, The test deliverables should come from a reporting system which can be sent by email to the team. This is another spike I will need to write up and have this sent. For now it will be simply by email which I can write up. This is because I am currently writing tests against the current production endpoints, as we begin to bring the AQS 2.0 endpoints online we surely want to check the help of all of them. Let me know what you think of this approach, or I can investigate how to start sending reports of the current tests in the prod environment

For tests that run at push/merge in a gerrit environment, I'm used to jenkins sending an email. The less-than-ideal way I'm used to learning about issues that occur after deployment is that someone notices a problem (hopefully in a structured way like Train Log Triage, but often in a random "why doesn't this work anymore?" way) and files a Phab task. I don't otherwise receive regular testing reports for anything that I've worked on.

I'll note that because MediaWiki is in a constant state of active development, the automated gerrit tests are running all the time. So those tests run far more often than daily, just by virtue of people working on the code. Those are tests against code in development and not the production appservers, but still - if major breakage somehow gets merged, it tends to get caught just by virtue of the number of people looking at MediaWiki. A service like AQS that relatively few (or post-launch probably zero) people are looking at daily is a different situation. If we don't do something to regularly run tests, then it won't happen.

@EChukwukere-WMF and @BPirkle is this task complete as is, or is it blocked by T328969?

I'm still unclear how the testing plan is accessibly communicated as a either a doc/sheet or on wiki or in some other way that shares the knowledge both internally for use now and externally for knowledge transfer for the future.

@VirginiaPoundstone this ticket was initially completed but other things came up and I will need to add to what has been done in the test plan that I started. So there is more for me to do here

This ticket was in my 'Ready for testing' column. I was getting ready to work on this

Can you bring this forward to the current sprint. I will mark it done when the document is complete and updated as I am adding things ( examples of how I approach the testing in details) that will also help the dev team.

Pls bring this forward to the current sprint and in the "Ready for testing" column as well

thank you

cc @JArguello-WMF

JArguello-WMF lowered the priority of this task from High to Medium.Feb 22 2023, 7:02 PM
JArguello-WMF removed Due Date.

Hi Emeka! During grooming and sprint planning the team decided to prioritize the work directly related to the AQS 2.0 device analytics. 2nd objective is pageviews. This testing plan is paused for the porpoises of the next sprint because there are other topics that have been put aside, for example, the team has not figured out anything about how your testing framework fits into anything related to the CI or if it will remain a manual thing, etc. @BPirkle can give you more details on the rationale behind putting this in pause. For the moment, the testing plan is not part of the current sprint. At the end of this iteration we can revisit and decide if we include it on the next one. Please let me know your comments or questions. :)

@JArguello-WMF that is fine as well. While the ticket is paused I can be working on it when I get a chance. No issues. As for the test framework, it cannot remain a manual thing for too long. I can set up a mini CI in gitlab in future sprints to have this run in automated fashion but eventually we really do want this integrated into the main CI where there services repo lives so it tests again it ( Be it in prod or a future pristine test/QA env)

cc @VirginiaPoundstone

Regarding pausing: a lot of Emeka's work is responsive and situational, so this seems like a good fallback ticket to keep active but low priority for times in which the developers don't have anything that needs to be tested.

Regarding CI: we should talk about that at an upcoming Code Mob (maybe one of tomorrow's meetings if we have time). The challenge is access to datastores to run tests against. The summary of what we learned in T328969: AQS 2.0: Revisit in-service testing approach is that we currently:

  • don't have a great place to put a staging datastore
  • don't want to grant access to production datastores from CI (probably at all, but for sure from the parts of the pipeline that run pre-merge, before a human has reviewed the code)
  • can't spin up testing containers as part of code pushes

What we can do is spin up a testing container in a late CI phase after the image has been created but before it is tagged. A test failure there, as I understand it, will block deployment, but will also block deployments until the issue is fixed, because the code change will have been merged to the main branch already.

We intentionally did not talk about the Python-based external integration testing when we considered the other types of testing, because we didn't think we were smart enough yet about all this. My current opinion is that we should push through the in-service testing and CI changes represented in T330222: AQS 2.0: allow Device Analytics in-service unit and integration tests to be executed separately and T330223: AQS 2.0: refactor Cassandra testing env for use in CI, then figure out how the external integration testing fits in. I'm hopeful that we will be able to run it in that same late-stage CI phase. But as we've never run anything exactly like this in that phase (and in fact, nobody at WMF has), there may be surprises. So I'd like to keep the bar low for our first attempt, and then expand after we get smarter.