Spike: Automated testing for visual regression design
Closed, ResolvedPublic3 Estimated Story PointsSpike
Actions

Assigned To

Authored By

	• STH
	May 16 2022, 2:43 PM

Description

Summary

As a Codex designer and engineer, I want to be able to rely on automated visual regression testing to detect and prevent unwanted visual alterations resulting from applying changes to components or tokens.

Acceptance Criteria

Connect with Readers Web to review existing visual regression tools and capabilities (+ @nray) to see if this would be a viable path forward for Codex components

Findings

We have two options, and may want to implement them both at some point:

Add tests for Codex components to Pixel, the visual regression testing tool developed and currently maintained by the Web team (although ownership may transfer to QTE)
Add visual regression testing to Codex itself

Add testing of Codex components to Pixel

Summary: Regular visual regression testing of Codex components in MediaWiki (desktop and mobile) will be a significant asset for our work and will help us catch bugs in the short and long term. However, the current setup and workflow could use some optimization.

Pros:

Established tool that may eventually be maintained by QTE
Robust testing within MediaWiki

Cons:

Workflow for testing Codex components is quite burdensome. Right now it requires pulling in a specific Codex commit locally in the VueTest repo, running code there to update the local version of Codex and copy over the new styles, then pushing a patch with those changes. Then, you can locally test that patch against reference images. This is not ideal for rapidly and frequently testing in-progress work.
The Sandbox page is not completely optimized for easy VRT. We can target selectors, which means we can easily set up a test for each component's <section> element, but even that causes issues when there are slight changes (even just pixel rendering differences) that cause a vertical layout shift

Add VRT to Codex

Summary: setting up a visual regression testing system in Codex would be lightweight and would allow us to quickly and frequently test both in-progress work and new releases. However, testing Codex components in isolation will only get us so much, and we will need to test them in MediaWiki to cover our most common and visible use cases.

Pros:

We can create a lightweight system that can run locally, as part of CI, and when we do new releases
No dependencies on other codebases (e.g. VueTest, Pixel) or MediaWiki

Cons:

Need to set it all up ourselves
Doesn't cover testing in MediaWiki

I think we should either do a simple BackstopJS setup (which is what Pixel uses) or consider a Cypress plugin if we think we'll use Cypress for other forms of testing. Another options we should explore is Microsoft's Playwright.

So, to do:

Soon:
- Submit a patch to Pixel to add Codex components and discuss with the Web team
- Set up VRT in Codex
Later:
- Improve the Sandbox page so we can avoid muddying the diff when there are vertical layout or size changes
- Improve the workflow for updating Codex code within the Sandbox

Details

	Subject	Repo	Branch	Lines +/-
	[WIP] Proof of concept of Cypress visual regression plugin	design/codex	main	+402 -14 K

Customize query in gerrit

Related Objects
Search...

Status	Subtype	Assigned	Task
Duplicate		• STH	T302348 <Platform Initiative> Release Codex in Beta (Limited Production Release)
Invalid		None	T302025 Upgrade Codex to Beta version
Invalid		None	T302038 [Epic] Define testing and QA strategy for Codex
Invalid		NBaca-WMF	T314082 Design System Infrastructure Capabilities Scoping
Resolved		Jrbranaa	T326686 Add tests for Codex components to Pixel
Resolved	Spike	AnneT	T308447 Spike: Automated testing for visual regression design

Event Timeline

• STH reassigned this task from Catrope to Volker_E.May 16 2022, 2:43 PM

• STH created this task.

• STH added a subscriber: Catrope.

See T291525 for additional history/context

See also T306846; Nick has already been thinking about this and it'd be great to partner on visual regression testing of TypeaheadSearch both in Codex and Vector.

• STH triaged this task as Low priority.May 18 2022, 3:53 PM

• STH moved this task from Inbox to Needs Refinement on the Design-System-Team board.

Sarai-WMDE mentioned this in T306180: [Epic] Define manual testing & QA strategy in Codex .May 18 2022, 5:13 PM

I'm happy to help with this if interested. As mentioned in T291525#7919512, the web team has been using visual regression testing for the last month and we've found it pretty useful. Personally, it cuts out a lot of the manual testing that I used to do for code review because it very efficiently captures a variety of viewport widths and pages that would have otherwise been very tedious to review manually.

We are currently testing a list of urls that point to a MediaWiki instance running in Docker containers, but I'm optimistic that this could be revised to serve the use case of a component library as well.

What are the urls or pages that you would want to capture?

• STH moved this task from Needs Refinement to Foundational Technology Backlog on the Design-System-Team board.May 26 2022, 1:09 AM

Looping in @EUdoh-WMF

Volker_E removed Volker_E as the assignee of this task.Jun 1 2022, 12:38 AM

Adding as a future scoping ticket that we may want to look at in 2 months or so. Linked to https://phabricator.wikimedia.org/T314082

• DAbad changed the subtype of this task from "Task" to "Spike".Aug 15 2022, 7:37 PM

ldelench_wmf moved this task from Foundational Technology Backlog to Backlog on the Design-System-Team board.Sep 16 2022, 6:13 PM

ldelench_wmf set the point value for this task to 3.Sep 19 2022, 3:36 PM

AnneT changed the task status from Open to In Progress.Jan 4 2023, 2:27 PM

AnneT claimed this task.

AnneT edited projects, added Design-System-Team (Design-System-Sprint); removed Design-System-Team, Epic.

We have two options, and may want to implement them both at some point:

Add tests for Codex components to Pixel, the visual regression testing tool developed and currently maintained by the Web team (although ownership may transfer to QTE)
Add visual regression testing to Codex itself

Add testing of Codex components to Pixel

Pros:

Established tool that may eventually be maintained by QTE
Robust testing within MediaWiki

Cons:

Workflow for testing Codex components is quite burdensome. Right now it requires pulling in a specific Codex commit locally in the VueTest repo, running code there to update the local version of Codex and copy over the new styles, then pushing a patch with those changes. Then, you can locally test that patch against reference images. This is not ideal for rapidly and frequently testing in-progress work.
Testing the Codex sandbox page, which includes all components, means that if a component has a change in vertical layout or size, the rest of the page appears to be changed in the diff, muddying the results for all subsequent components
The sandbox page in the VueTest extension uses Codex design tokens, which are currently broken in MediaWiki (see T325237)

Add VRT to Codex

Pros:

We can create a lightweight system that can run locally, as part of CI, and when we do new releases
No dependencies on other codebases (e.g. VueTest, Pixel) or MediaWiki

Cons:

Need to set it all up ourselves
Doesn't cover testing in MediaWiki

I think we should either do a simple BackstopJS setup (which is what Pixel uses) or consider a Cypress plugin if we think we'll use Cypress for other forms of testing.

So, to do:

Now:
- Fix the Less compiling issue described in T325237 so we can use design tokens within MediaWiki
- Submit a patch to Pixel to add Codex components and discuss with the Web team
- Set up VRT in Codex
Later:
- Improve the Sandbox page so we can avoid muddying the diff when there are vertical layout or size changes
- Improve the workflow for updating Codex code within the Sandbox

In T308447#8498764, @AnneT wrote:

Testing the Codex sandbox page, which includes all components, means that if a component has a change in vertical layout or size, the rest of the page appears to be changed in the diff, muddying the results for all subsequent components

One strategy here is to only capture a selector or selectors on the page to reduce this noise. For example, if there is a #search-box-container that contains the typeahead search component, you can make the test only capture that through the selectors option . We (the web team) use a similar strategy for testing the Echo extension in which we exclude many selectors from the page.

I think we should either do a simple BackstopJS setup (which is what Pixel uses) or consider a Cypress plugin if we think we'll use Cypress for other forms of testing.

If going this route, I suggest you also look into Playwright from Microsoft. It seems like that has become quite popular recently, and I've been interested in experimenting with it in Pixel (either in conjunction with Backstop or as a standalone testing framework). From my understanding, you can set it up to do visual comparisons although the reporter is not as nice as BackstopJS right now IMO.

@nray thanks for your response and these suggestions!

In T308447#8499311, @nray wrote:

One strategy here is to only capture a selector or selectors on the page to reduce this noise. For example, if there is a #search-box-container that contains the typeahead search component, you can make the test only capture that through the selectors option . We (the web team) use a similar strategy for testing the Echo extension in which we exclude many selectors from the page.

That makes sense, and each component is already wrapped in a <section> element with an ID, so it would be easy to isolate them. Would adding a test for each section be too noisy or add too much time to the testing process, though? We're at 24 components now and have over 40 planned for Codex. As we consider adding Codex components to Pixel, I'd like to strategize about how we can avoid adding too much noise, maintenance burden, etc. Any advice you have here would be very welcome!

If going this route, I suggest you also look into Playwright from Microsoft.

I haven't heard of this one and will look into it!

Change 875397 had a related patch set uploaded (by Anne Tomasevich; author: Anne Tomasevich):

[design/codex@main] [WIP] Proof of concept of Cypress visual regression plugin

https://gerrit.wikimedia.org/r/875397

gerritbot added a project: Patch-For-Review.Jan 4 2023, 5:52 PM

The Cypress VRT plugin is...okay. Cypress wasn't made for visual testing, and I think we should only go this route if we really intend to use Cypress for other things and we can get the VRT working better. See the patch for more comments.

That makes sense, and each component is already wrapped in a <section> element with an ID, so it would be easy to isolate them. Would adding a test for each section be too noisy or add too much time to the testing process, though? We're at 24 components now and have over 40 planned for Codex.

For reference, our desktop report contains 138 tests (which includes testing 5 different viewport sizes) and takes about 3 minutes to compare two different states of MediaWiki (e.g. the latest release branch with master). That time could probably be optimized more, but it is certainly much faster than manual testing. 64 tests are not that many, but that can quickly multiply depending on how many viewport sizes you want to test for each test and how many interactions you require for each component.

As we consider adding Codex components to Pixel, I'd like to strategize about how we can avoid adding too much noise, maintenance burden, etc. Any advice you have here would be very welcome!

I'm happy to answer any questions you have 😊. I imagine that Codex would have its own report separate from our "desktop" report so that you could only run the tests that are relevant to Codex. Additionally, if you used specific selectors to capture the components, I think that would help reduce the noise.

From my experience, adding tests that simply take a screenshot of something and don't require any interaction are fairly fast to write and can be done in around 4 lines for each test. The tests that require interactions (e.g. clicking an input and waiting for something to appear) can take considerably longer to write and make robust.

ldelench_wmf mentioned this in T326686: Add tests for Codex components to Pixel.Jan 10 2023, 10:33 PM

ldelench_wmf added a parent task: T326686: Add tests for Codex components to Pixel.

AnneT closed this task as Resolved.Jan 23 2023, 6:01 PM

AnneT removed a project: Patch-For-Review.

AnneT updated the task description. (Show Details)

Change 875397 abandoned by Anne Tomasevich:

[design/codex@main] [WIP] Proof of concept of Cypress visual regression plugin

Reason:

https://gerrit.wikimedia.org/r/875397

Spike: Automated testing for visual regression designClosed, ResolvedPublic3 Estimated Story PointsSpikeActions

Description

Summary

Acceptance Criteria

Findings

Add testing of Codex components to Pixel

Add VRT to Codex

Details

Related ObjectsSearch...

Event Timeline

Add testing of Codex components to Pixel

Add VRT to Codex

Spike: Automated testing for visual regression design
Closed, ResolvedPublic3 Estimated Story PointsSpike
Actions

Related Objects
Search...