Page MenuHomePhabricator

[Spike, 10hrs] Investigate automated visual regression tools
Closed, ResolvedPublic0 Estimated Story PointsSpike


User story

  • As a current user, when I opt in to "Desktop Improvements", I would like to maintain a visually pleasing experience as more features/changes are pushed to production to encourage me to stay in "Desktop Improvements" mode.
  • As a current user, when I opt out of "Desktop Improvements", I would like to see Vector free of visual regressions.
  • As a developer, when I make changes related to "Desktop Improvements", I would like to have more confidence that the changes I make will not cause visual regressions without investing a lot of time manually testing/checking different modes and pages.

Question we are trying to answer

  • Will automated visual regression testing help with this? If so, is there any tooling in particular that is worth pursuing?
  • Is the answer always more software?™

Acceptance Criteria

  • Determine if any popular automated visual regression tools could help reduce visual regressions without imposing a huge burden on development/maintenance.
  • Write down your findings

Event Timeline

nray triaged this task as Medium priority.Oct 28 2019, 6:39 PM
nray created this task.

@ovasileva Could we pull this into one of our sprints in the near future so that I could spend some time researching this? It has potential to positively impact the Desktop Improvements project. It's also one of my OKRs :).

We (@Jdrewniak @nray) talked about this today and this is ready to go!

nray renamed this task from [Spike] Investigate automated visual regression tools to [Spike 10hrs] Investigate automated visual regression tools.Oct 29 2019, 4:17 PM
nray renamed this task from [Spike 10hrs] Investigate automated visual regression tools to [Spike, 10hrs] Investigate automated visual regression tools.

Here are the main points I have surrounding visual regression testing:

TLDR: I think visual regression testing could be useful for us, but it will likely require time to setup properly and buy in from members of the team to get the most out of it. From previous discussions with the team, I think the general feelings around the usefulness of of visual regression testing is mixed at best with concerns of increased maintenance (see integration tests), slow tests, high rates of false positives (see integration tests), etc. and any effort spent on it should be seen as experimental until it's worth has been validated. Given this, I'm planning to experiment with this more in my spare 10% time. If it proves worthwhile maybe some more dedicated time can be spent on it in the regular sprint.

  • There has already been an excellent discussion on visual regression testing in T107588
  • There is a fairly comprehensive list of visual regression tools at It looks like previous work was already done evalulating some of these [5]. There are also quite a lot of paid visual regression services out there including Percy [6] and Applitools [7] (apparently integrates well with WebdriverIO). I tried out Percy briefly and found it interesting especially their support for visual testing storybook components (which we use in MobileFrontend), but I think we could probably get away with using the free, open source tools.
  • As Joaquin states in his reponse in T107588#3270372, visual regression testing can be done in a variety of ways including:
    • Having a set of already approved screenshots that each test runs against. However, this means we would also have to maintain a set of "approved" screenshots.
    • Having a test compare screenshots from two "live instances" (e.g. master vs dev branch). In this setup, master might be assumed to be the correct version. This option appeals to me as it means we would not have to maintain any screenshots, but it would probably result in slower tests and visual regressions could still slip into master if the tests weren't habitually checked before merging into master (assuming tests are non-voting).
  • The parsing team has done a lot interesting work with Visual Regression Testing [8]. They ran two mediawiki-vagrant lab VMs and tested 60k pages across 40 wikis during the Tidy replacement project [1][2]. They created "integration-visualdiff" [3] and Uprightdiff [4] tools to perform the actual diffs between images and calculate a score for their differences.
  • I met with Subbu to discuss their work with Visual Regression testing, and, overall, he had good things to say about it although he did stress the importance of coming up with a way to get around false positives. During their project, they experienced ~1px vertical whitespace differences that they were able to ignore using their Uprightdiff tool.
  • I am still interested in Visual Regression testing and believe that it could prove valuable in the Desktop Improvements project in reducing regressions before they hit production, but it will definitely take time to setup properly, and it's hard for me to justify prioritizing it over many other competing priorities related to the Desktop Improvements project at this time. For now, I've been working on it using 10% time although this will definitely be a slow process and there is a risk that I will lose interest in it.

Next Steps: I'm planning to experiment with this more in my spare 10% time. If it proves worthwhile maybe some more dedicated time can be spent on it in the regular sprint.









nray updated the task description. (Show Details)

Sounds like the suggestion is not to invest more time in this (and not have T107588 as a quarterly goal?)

I think removing it as a quarterly goal makes sense for now as I think we have more important things we can focus on first (e.g. I certainly wouldn't prioritize it over adding unit tests in vector and some of the other foundational work that we've discussed for desktop improvements). However, I do plan to invest more time in this in my 10% time as I still think it has potential to help.

I should also add that if other members from the team think this work should be a top priority, please let me know! From what I gathered from various discussions with the team though (at offsite, at retro, etc) I think there are some valid concerns about it and that we probably have more important priorities beforehand which is partly why I've suggested experimenting with it in 10% time.

I've also recently talked with @Edtadros and sounds like he might be investigating this as well which would be great!

ovasileva claimed this task.

Thanks @nray! Resolving this and moving T107588: EPIC: Detect and prevent UI regressions to the epics column within the backlog.