Page MenuHomePhabricator

Spike: Automated testing for visual regression design
Closed, ResolvedPublic3 Estimated Story PointsSpike

Description

Summary

As a Codex designer and engineer, I want to be able to rely on automated visual regression testing to detect and prevent unwanted visual alterations resulting from applying changes to components or tokens.

Acceptance Criteria

  • Connect with Readers Web to review existing visual regression tools and capabilities (+ @nray) to see if this would be a viable path forward for Codex components

Findings

We have two options, and may want to implement them both at some point:

  1. Add tests for Codex components to Pixel, the visual regression testing tool developed and currently maintained by the Web team (although ownership may transfer to QTE)
  2. Add visual regression testing to Codex itself

Add testing of Codex components to Pixel

Summary: Regular visual regression testing of Codex components in MediaWiki (desktop and mobile) will be a significant asset for our work and will help us catch bugs in the short and long term. However, the current setup and workflow could use some optimization.

Pros:

  • Established tool that may eventually be maintained by QTE
  • Robust testing within MediaWiki

Cons:

  • Workflow for testing Codex components is quite burdensome. Right now it requires pulling in a specific Codex commit locally in the VueTest repo, running code there to update the local version of Codex and copy over the new styles, then pushing a patch with those changes. Then, you can locally test that patch against reference images. This is not ideal for rapidly and frequently testing in-progress work.
  • The Sandbox page is not completely optimized for easy VRT. We can target selectors, which means we can easily set up a test for each component's <section> element, but even that causes issues when there are slight changes (even just pixel rendering differences) that cause a vertical layout shift

Add VRT to Codex

Summary: setting up a visual regression testing system in Codex would be lightweight and would allow us to quickly and frequently test both in-progress work and new releases. However, testing Codex components in isolation will only get us so much, and we will need to test them in MediaWiki to cover our most common and visible use cases.

Pros:

  • We can create a lightweight system that can run locally, as part of CI, and when we do new releases
  • No dependencies on other codebases (e.g. VueTest, Pixel) or MediaWiki

Cons:

  • Need to set it all up ourselves
  • Doesn't cover testing in MediaWiki

I think we should either do a simple BackstopJS setup (which is what Pixel uses) or consider a Cypress plugin if we think we'll use Cypress for other forms of testing. Another options we should explore is Microsoft's Playwright.


So, to do:

  • Soon:
    • Submit a patch to Pixel to add Codex components and discuss with the Web team
    • Set up VRT in Codex
  • Later:
    • Improve the Sandbox page so we can avoid muddying the diff when there are vertical layout or size changes
    • Improve the workflow for updating Codex code within the Sandbox

Event Timeline

STH created this task.
STH added a subscriber: Catrope.

See also T306846; Nick has already been thinking about this and it'd be great to partner on visual regression testing of TypeaheadSearch both in Codex and Vector.

STH moved this task from Inbox to Needs Refinement on the Design-System-Team board.

I'm happy to help with this if interested. As mentioned in T291525#7919512, the web team has been using visual regression testing for the last month and we've found it pretty useful. Personally, it cuts out a lot of the manual testing that I used to do for code review because it very efficiently captures a variety of viewport widths and pages that would have otherwise been very tedious to review manually.

We are currently testing a list of urls that point to a MediaWiki instance running in Docker containers, but I'm optimistic that this could be revised to serve the use case of a component library as well.

What are the urls or pages that you would want to capture?

DAbad subscribed.

Adding as a future scoping ticket that we may want to look at in 2 months or so. Linked to https://phabricator.wikimedia.org/T314082

DAbad changed the subtype of this task from "Task" to "Spike".Aug 15 2022, 7:37 PM
ldelench_wmf set the point value for this task to 3.Sep 19 2022, 3:36 PM
AnneT changed the task status from Open to In Progress.Jan 4 2023, 2:27 PM
AnneT claimed this task.

We have two options, and may want to implement them both at some point:

  1. Add tests for Codex components to Pixel, the visual regression testing tool developed and currently maintained by the Web team (although ownership may transfer to QTE)
  2. Add visual regression testing to Codex itself

Add testing of Codex components to Pixel

Summary: Regular visual regression testing of Codex components in MediaWiki (desktop and mobile) will be a significant asset for our work and will help us catch bugs in the short and long term. However, the current setup and workflow could use some optimization.

Pros:

  • Established tool that may eventually be maintained by QTE
  • Robust testing within MediaWiki

Cons:

  • Workflow for testing Codex components is quite burdensome. Right now it requires pulling in a specific Codex commit locally in the VueTest repo, running code there to update the local version of Codex and copy over the new styles, then pushing a patch with those changes. Then, you can locally test that patch against reference images. This is not ideal for rapidly and frequently testing in-progress work.
  • Testing the Codex sandbox page, which includes all components, means that if a component has a change in vertical layout or size, the rest of the page appears to be changed in the diff, muddying the results for all subsequent components
  • The sandbox page in the VueTest extension uses Codex design tokens, which are currently broken in MediaWiki (see T325237)

Add VRT to Codex

Summary: setting up a visual regression testing system in Codex would be lightweight and would allow us to quickly and frequently test both in-progress work and new releases. However, testing Codex components in isolation will only get us so much, and we will need to test them in MediaWiki to cover our most common and visible use cases.

Pros:

  • We can create a lightweight system that can run locally, as part of CI, and when we do new releases
  • No dependencies on other codebases (e.g. VueTest, Pixel) or MediaWiki

Cons:

  • Need to set it all up ourselves
  • Doesn't cover testing in MediaWiki

I think we should either do a simple BackstopJS setup (which is what Pixel uses) or consider a Cypress plugin if we think we'll use Cypress for other forms of testing.


So, to do:

  • Now:
    • Fix the Less compiling issue described in T325237 so we can use design tokens within MediaWiki
    • Submit a patch to Pixel to add Codex components and discuss with the Web team
    • Set up VRT in Codex
  • Later:
    • Improve the Sandbox page so we can avoid muddying the diff when there are vertical layout or size changes
    • Improve the workflow for updating Codex code within the Sandbox
  • Testing the Codex sandbox page, which includes all components, means that if a component has a change in vertical layout or size, the rest of the page appears to be changed in the diff, muddying the results for all subsequent components

One strategy here is to only capture a selector or selectors on the page to reduce this noise. For example, if there is a #search-box-container that contains the typeahead search component, you can make the test only capture that through the selectors option . We (the web team) use a similar strategy for testing the Echo extension in which we exclude many selectors from the page.

I think we should either do a simple BackstopJS setup (which is what Pixel uses) or consider a Cypress plugin if we think we'll use Cypress for other forms of testing.

If going this route, I suggest you also look into Playwright from Microsoft. It seems like that has become quite popular recently, and I've been interested in experimenting with it in Pixel (either in conjunction with Backstop or as a standalone testing framework). From my understanding, you can set it up to do visual comparisons although the reporter is not as nice as BackstopJS right now IMO.

@nray thanks for your response and these suggestions!

One strategy here is to only capture a selector or selectors on the page to reduce this noise. For example, if there is a #search-box-container that contains the typeahead search component, you can make the test only capture that through the selectors option . We (the web team) use a similar strategy for testing the Echo extension in which we exclude many selectors from the page.

That makes sense, and each component is already wrapped in a <section> element with an ID, so it would be easy to isolate them. Would adding a test for each section be too noisy or add too much time to the testing process, though? We're at 24 components now and have over 40 planned for Codex. As we consider adding Codex components to Pixel, I'd like to strategize about how we can avoid adding too much noise, maintenance burden, etc. Any advice you have here would be very welcome!

If going this route, I suggest you also look into Playwright from Microsoft.

I haven't heard of this one and will look into it!

Change 875397 had a related patch set uploaded (by Anne Tomasevich; author: Anne Tomasevich):

[design/codex@main] [WIP] Proof of concept of Cypress visual regression plugin

https://gerrit.wikimedia.org/r/875397

The Cypress VRT plugin is...okay. Cypress wasn't made for visual testing, and I think we should only go this route if we really intend to use Cypress for other things and we can get the VRT working better. See the patch for more comments.

That makes sense, and each component is already wrapped in a <section> element with an ID, so it would be easy to isolate them. Would adding a test for each section be too noisy or add too much time to the testing process, though? We're at 24 components now and have over 40 planned for Codex.

For reference, our desktop report contains 138 tests (which includes testing 5 different viewport sizes) and takes about 3 minutes to compare two different states of MediaWiki (e.g. the latest release branch with master). That time could probably be optimized more, but it is certainly much faster than manual testing. 64 tests are not that many, but that can quickly multiply depending on how many viewport sizes you want to test for each test and how many interactions you require for each component.

As we consider adding Codex components to Pixel, I'd like to strategize about how we can avoid adding too much noise, maintenance burden, etc. Any advice you have here would be very welcome!

I'm happy to answer any questions you have 😊. I imagine that Codex would have its own report separate from our "desktop" report so that you could only run the tests that are relevant to Codex. Additionally, if you used specific selectors to capture the components, I think that would help reduce the noise.

From my experience, adding tests that simply take a screenshot of something and don't require any interaction are fairly fast to write and can be done in around 4 lines for each test. The tests that require interactions (e.g. clicking an input and waiting for something to appear) can take considerably longer to write and make robust.

AnneT removed a project: Patch-For-Review.
AnneT updated the task description. (Show Details)

Change 875397 abandoned by Anne Tomasevich:

[design/codex@main] [WIP] Proof of concept of Cypress visual regression plugin

Reason:

https://gerrit.wikimedia.org/r/875397