Page MenuHomePhabricator

Jenkins: Set up perceptual diffs (visual regression testing)
Open, Stalled, LowestPublic


This is something I've been experimenting with in spare time for a while.

The idea is: Render a visual diff for one or more pages and states thereof (comparing the result of the current change to the result of the latest master, or whatever target branch the commit has).

The screenshots would be created using PhantomJS' render() API. Or, while we don't need cross-browser per se, having a browser more representative than PhantomJS would be nice. Perhaps using Chromium under Xvfb with a large enough window (we don't need it to be very tall). And capture output from Xvfb using ImageMagic display import.

Rough idea for the Jenkins job:

  • Run project setup (e.g. build script for projects like OOjs UI and VisualEditor; installing MediaWiki for core/extensions). Then expose workspace to the local web server (We've got re-usable macros for this already).
  • Run the scenarios or urls for the current project and capture the screen after each scenario.
  • Compare them against the ones from the last run (e.g. for a commit to master, compare them to the latest master build). TODO: Will need to be stored somewhere. Shared NFS maybe? store/{project}/{branch}.
  • In test pipeline:
    • If different, make sure the latest.png/change-after.png/change-diff.png for that url is kept and stored as build artefacts in Jenkins. Otherwise delete the image.
  • In the post-merge pipeline:
    • Replace the images in the store with those of this build.


I imagine we'll need to support two kinds of scenarios:

  • Plain url.
  • Web driver steps (for large interfaces not accessible by url). This should *not* be used to trigger every possible dialog and component, that slows the test matrix and only tests for no reason. More useful would be to capture individual components via e.g. the OOjs UI demo page. Use these two assert the composition rather.

A few urls we might want:

  • mediawiki-core:
    • /index.php?title=Main_Page
    • /index.php?title=Main_Page&useskin=monobook
    • /index.php?title=Main_Page&action=edit
    • /index.php?title=Main_Page&action=history
    • /index.php?title=Special:UserLogin
    • /index.php?title=Special:UserLogin/signup
    • /index.php?title=Special:Search&search=wiki
  • VisualEditor:
    • /demos/ve/#!/src/pages/empty.html
    • /demos/ve/#!/src/pages/simple.html
    • /demos/ve/#!/src/pages/complex.html
  • oojs-ui:
    • /demos/icons.html
    • /demos/widgets.html

A few implementations that exist:

Behind these is basically just a ImageMagick compare command between two PNGs.

      -metric RMSE
      -highlight-color RED
      -compose Src

See also:



Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 3:09 AM
bzimport set Reference to bz62633.
bzimport added a subscriber: Unknown Object (MLST).

The job would be non-voting of course, and we'd change the jenkins-bot comment to Gerrit to CHANGED/UNCHANGED instead of SUCCESS/FAILURE.

I assume that it's easier to roll our own than to re-use wraith or something similar, due to trying to integrate it into jenkins? Multi-platform screenshot regression testing is probably a secondary-level target, but…

Mentionned by Subbu on IRC: an example of what visual diffs can bring to us

See for a version that is being used to compare Parsoid and PHP parser HTML output. That works quite well and has already exposed a few css issues and other non-css rendering/html diffs.

Something similar could perhaps be adapted for this purpose as well?

Lowering priority from high to normal since nobody is apparently actively pushing for this change. Whenever the feature teams figure out a good utility / way to do such visual differences we can work on integrating it on Jenkins/Zuul.

Krinkle lowered the priority of this task from Medium to Lowest.Jan 8 2015, 1:26 PM
Krinkle set Security to None.
Krinkle removed a subscriber: Unknown Object (MLST).
hashar changed the task status from Open to Stalled.Oct 6 2015, 12:34 PM

Until we have a data store to expose the generated files ( T101545: Provide infrastructure to store files by project/branch post-merge to compare with pre-merge ). There is not much we can do on this task.

Web team has built out a solution here without jenkins that is working well so far. I am curious Timo if this could solve what you had in mi d.

Reports can be found at

More information at T302246: [GOAL] Leverage Automated Visual Regression Testing