Page MenuHomePhabricator

Run performance test on commits (Fresnel)
Closed, ResolvedPublic

Description

In the current setup with WebPageTest we continuously test a couple of pages on different hosts. But we don't test commits. The long term goal should be to continuously and automatically do performance tests on new commits.

A first step could be to do one of the things that Facebook do: Make it possible for a developer to easily test her/his changes and get back median metrics with a confident value.

To make that happen there's a lot we need to do:

  • We should have the test infrastructure in house and not on Amazon to make sure we have full control, to be able to minimize variance.
  • We should prepare for testing on real mobile devices.
  • Dedicated hardware to run the tests (we should have both desktop and mobile testing)
  • Run the tests on commits
  • GUI driven, whatever tools we use, it must be super easy to start a test and easy to understand the result = should the code move on or not.

See also:

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Krinkle claimed this task.Aug 22 2018, 2:19 AM
Krinkle moved this task from Next In This Quarter to Doing on the Performance-Team board.
Krinkle added a subscriber: Krinkle.

Change 444773 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[mediawiki/core@master] resourceloader: Implement mw.inspect 'time' report

https://gerrit.wikimedia.org/r/444773

Change 454465 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[mediawiki/core@master] [WIP] resourceloader: Add rIC and $.ready tracking to mw.inspect

https://gerrit.wikimedia.org/r/454465

Krinkle added a subscriber: ori.EditedAug 22 2018, 7:37 PM

Very cool. I am wondering why you chose to make this off-by-default, though, since one use-case is "why did this article I just loaded take so long to load?"., and profiler.js is pretty tiny.

(Responding here for visibility as others might wonder the same, and because I got a bit side-tracked into other stuff to talk about.)

A couple reasons for off by default:

  1. I've already encountered and had to fix dozens of subtle bugs in just this simple initial version, so until more experience, I'm not letting it go in prod breaking things in hard to detect ways. Not initially anyway.
  1. I'm not too worried about the byte size actually, and more about run-time overhead and memory use. On that topic, something I've come to dislike is memory size of the registry object, in particular from the styes and scripts retention (copy of implement payload). These are unused after execution, but were not removed. They were removable, until introduction of mw.inspect.grep, which is useful, but makes the registry quite big. The same also applies to the full-size copy of unserialised mw.loader.store. I agree these are useful for debugging in production, and it is useful to be able to debug an already loaded page. But the underlying issue is in my opinion that our actual "debug" mode is not useful. I haven't used it years, and actively recommend others to avoid it. Its behaviour doesn't match prod, many features are unavailable, it's slow, it's inefficient etc. It's one of my regrets to not have addressed earlier. Part of T85805 (maybe next quarter/next year?), I'd like to re-implement debug mode, starting with plain prod mode, unchanged, except for minification (perhaps), and a few additive features (no behaviour change). By being useful and low in overhead, it also addresses the issue of being able to instrument an already loaded page, as one could just leave it on by default for yourself (via cookie). After that, we could consider enabling profiling by default (in debug mode). And optimise prod mode to more actively deference parts of the registry.
  1. This new profiling feature is, right now, mainly for perf-testing on commit. The rough idea is to add a Jenkins job that will install MW twice (before/after) and extract various metrics and present the median, stddev, and difference between runs of those metrics, and fail the build on exceeding of certain thresholds. Somewhat similar to what Lego recently introduced for code coverage. These module-level metrics would be part of the collected dataset.
  1. This feature's secondary purpose is to help with the initial audit of various features' overhead on page load time. This is mostly low-hanging fruit that can be easily demonstrated and worked on from one-off profiling locally. I've been doing this manually for a few months, but having a published method will help make it more visible to others. To collectively cut down time-to-interactive. Much of it is/was in WikimediaEvents – expensive DOM queries, large node lists, attaching 100s/1000s of event listeners, unconditionally pre-computing cryptos – all blocking the module from reaching state "ready" and executing the next module. Addressing these issues typically involves making things lazy/conditional, adopting delegate events where possible, or moving to dom-ready, window-load or requestIdleCallback, etc.

Change 444773 merged by jenkins-bot:
[mediawiki/core@master] resourceloader: Implement mw.inspect 'time' report

https://gerrit.wikimedia.org/r/444773

Peter added a comment.Aug 27 2018, 7:48 PM

We should have the test infrastructure in house and not on Amazon to make sure we have full control, to be able to minimize variance.

The ironic thing here is that when Gilles and me tried to get as stable metrics as possible with different proxies Amazon gave us far better numbers and minimized variance much more than our own infrastructure. I think we need help with this before we move on.

@Peter Yeah, I expect that values between runs will be unstable. However, I'm hoping it will at least be stable back-to-back when quickly running "before" and "after" in the same Jenkins job – which is my current thinking for commit testing.

We can't store or plot the metrics between builds in a useful way anyway because there will not be a logical progression between builds (each build is for a different unrelated repository, branch and commit that is still being reviewed, not merged).

But, it is possible that even within the same build doing two runs after each other will be unstable. I'll find out soon :)

I see your point. It will be really interesting to see the result!

Change 459268 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[integration/config@master] [WIP] Add performance-patch job

https://gerrit.wikimedia.org/r/459268

Change 459409 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[performance/fresnel@master] initial commit

https://gerrit.wikimedia.org/r/459409

Change 463886 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[integration/config@master] Add pipeline for performance/fresnel.git (npm-docker)

https://gerrit.wikimedia.org/r/463886

Change 463886 merged by jenkins-bot:
[integration/config@master] Add pipeline for performance/fresnel.git (npm-docker)

https://gerrit.wikimedia.org/r/463886

Peter added a comment.Oct 17 2018, 6:03 AM

Did you see https://tech.trivago.com/2018/10/12/building-fast-and-reliable-web-applications/ @Krinkle

They use Puppeteer to collect some info, Lighthouse for some and run sitespeed.io for some (but they do not graph the one from sitespeed.io what's up with that?). All running in a Jenkins setup.

Change 480256 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[integration/config@master] Switch fresnel Jenkins job from 'npm' to 'npm-with-browser' image

https://gerrit.wikimedia.org/r/480256

Change 480256 merged by jenkins-bot:
[integration/config@master] Switch fresnel Jenkins job from 'npm' to 'npm-with-browser' image

https://gerrit.wikimedia.org/r/480256

Change 482572 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[integration/config@master] Create node-10-docker job template and use for Fresnel

https://gerrit.wikimedia.org/r/482572

Change 482572 merged by jenkins-bot:
[integration/config@master] Create node-10-docker job template and use for Fresnel

https://gerrit.wikimedia.org/r/482572

Change 482695 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[integration/config@master] Switch Fresnsel from node10-test to node10-browser-test

https://gerrit.wikimedia.org/r/482695

Change 482695 merged by jenkins-bot:
[integration/config@master] Switch Fresnsel from node10-test to node10-browser-test

https://gerrit.wikimedia.org/r/482695

Change 482749 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[performance/fresnel@master] Require Node 10 and adopt async-await

https://gerrit.wikimedia.org/r/482749

Krinkle renamed this task from Run performance test on commits to Run performance test on commits (Fresnel).Jan 8 2019, 2:33 AM

Change 482751 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[integration/config@master] Add Fresnel postmerge job for publishing coverage report

https://gerrit.wikimedia.org/r/482751

Change 482751 merged by jenkins-bot:
[integration/config@master] Create Node 10 coverage-publish job and use for Fresnel

https://gerrit.wikimedia.org/r/482751

Change 482752 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[integration/config@master] zuul: Add custom DOC_PROJECT destination for fresnel

https://gerrit.wikimedia.org/r/482752

Change 482752 merged by jenkins-bot:
[integration/config@master] zuul: Add custom DOC_PROJECT destination for fresnel

https://gerrit.wikimedia.org/r/482752

Change 459409 merged by jenkins-bot:
[performance/fresnel@master] initial commit

https://gerrit.wikimedia.org/r/459409

Change 482749 merged by jenkins-bot:
[performance/fresnel@master] Require Node 10 and adopt async-await

https://gerrit.wikimedia.org/r/482749

Change 488121 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[mediawiki/core@master] build: Add fresnel scenarios config

https://gerrit.wikimedia.org/r/488121

Change 488187 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[performance/fresnel@master] build: Remove the MediaWiki .fresnel.yml file

https://gerrit.wikimedia.org/r/488187

Change 488187 merged by jenkins-bot:
[performance/fresnel@master] build: Remove the MediaWiki .fresnel.yml file

https://gerrit.wikimedia.org/r/488187

Change 459268 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[integration/config@master] [WIP] Add performance-patch job
https://gerrit.wikimedia.org/r/459268

This draft revision to the quibble-strech-php70 docker image is building without errors for me locally now, and installs Fresnel from npm.

The problem though, is that the Quibble image ships with Node 6 instead of Node 10 still. So we'll need to fix that first.

This comment was removed by Krinkle.

Change 459268 merged by jenkins-bot:
[integration/config@master] Add mediawiki-fresnel-patch job

https://gerrit.wikimedia.org/r/459268

Mentioned in SAL (#wikimedia-releng) [2019-02-18T21:37:18Z] <Krinkle> Updating docker-pkg files on contint1001 for https://gerrit.wikimedia.org/r/459268 / T133646

Change 491378 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[integration/config@master] Add fresnel to 'experimental' pipeline for mediawiki/core:master

https://gerrit.wikimedia.org/r/491378

Change 491378 merged by jenkins-bot:
[integration/config@master] Add fresnel to 'experimental' pipeline for mediawiki/core:master

https://gerrit.wikimedia.org/r/491378

Change 491381 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[integration/config@master] Add missing node-gyp deps to fresnel image

https://gerrit.wikimedia.org/r/491381

Change 491382 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[integration/config@master] jjb: Update Fresnel job to quibble-fresnel:0.0.28-2

https://gerrit.wikimedia.org/r/491382

Change 491381 merged by jenkins-bot:
[integration/config@master] Add missing node-gyp deps to fresnel image

https://gerrit.wikimedia.org/r/491381

Mentioned in SAL (#wikimedia-releng) [2019-02-19T00:54:25Z] <Krinkle> Updating docker-pkg files on contint1001 for https://gerrit.wikimedia.org/r/491381 / bc8e4198961cb73 / T133646

Change 491382 merged by jenkins-bot:
[integration/config@master] jjb: Update Fresnel job to quibble-fresnel:0.0.28-2

https://gerrit.wikimedia.org/r/491382

Change 491394 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[integration/config@master] fresnel: Fix syntax error

https://gerrit.wikimedia.org/r/491394

Change 491394 merged by jenkins-bot:
[integration/config@master] fresnel: Fix syntax error

https://gerrit.wikimedia.org/r/491394

Change 491396 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[integration/config@master] Set CHROMIUM_FLAGS in fresnel image

https://gerrit.wikimedia.org/r/491396

Change 491403 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[integration/config@master] jjb: Update Fresnel job to quibble-fresnel:0.0.28-3

https://gerrit.wikimedia.org/r/491403

Change 491396 merged by Krinkle:
[integration/config@master] Set CHROMIUM_FLAGS in fresnel image

https://gerrit.wikimedia.org/r/491396

Change 491403 merged by Krinkle:
[integration/config@master] jjb: Update Fresnel job to quibble-fresnel:0.0.28-3

https://gerrit.wikimedia.org/r/491403

Change 491405 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[integration/config@master] add firefox-esr to fresnel, to make chromium work (!)

https://gerrit.wikimedia.org/r/491405

Change 491406 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[integration/config@master] jjb: Update Fresnel job to quibble-fresnel:0.0.28-4

https://gerrit.wikimedia.org/r/491406

Change 491405 merged by jenkins-bot:
[integration/config@master] add firefox-esr to fresnel, to make chromium work (!)

https://gerrit.wikimedia.org/r/491405

Change 491406 merged by jenkins-bot:
[integration/config@master] jjb: Update Fresnel job to quibble-fresnel:0.0.28-4

https://gerrit.wikimedia.org/r/491406

Change 491513 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[integration/config@master] Set MW_SERVER/MW_SCRIPT_PATH for fresnel

https://gerrit.wikimedia.org/r/491513

Change 491516 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[integration/config@master] jjb: Update Fresnel job to quibble-fresnel:0.0.28-5

https://gerrit.wikimedia.org/r/491516

Change 491513 merged by jenkins-bot:
[integration/config@master] Set MW_SERVER/MW_SCRIPT_PATH for fresnel

https://gerrit.wikimedia.org/r/491513

Change 491516 merged by jenkins-bot:
[integration/config@master] jjb: Update Fresnel job to quibble-fresnel:0.0.28-5

https://gerrit.wikimedia.org/r/491516

Change 488121 merged by jenkins-bot:
[mediawiki/core@master] build: Add initial version of Fresnel config

https://gerrit.wikimedia.org/r/488121

Change 491577 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[integration/config@master] zuul: Enable fresnel job on coverage pipeline

https://gerrit.wikimedia.org/r/491577

Change 491577 merged by jenkins-bot:
[integration/config@master] zuul: Enable fresnel job in a new 'patch-performance' pipeline

https://gerrit.wikimedia.org/r/491577

Krinkle closed this task as Resolved.Feb 19 2019, 10:49 PM

Future development will be tracked under under Fresnel

Tgr awarded a token.Mar 10 2019, 6:00 AM