Page MenuHomePhabricator

[Research] Proxy for local performance testing
Closed, ResolvedPublic


To be able to do T133646 (test front end performance on commits) we need to have a proxy to cache responses to have as little disturbance in the metrics as possible. Both Facebook and the Chrome team uses that approach when measuring front end performance.

How it will work

  1. Access a URL through the proxy.
  2. DNS and assets are loaded from Internet.
  3. The proxy caches all the assets and the DNS lookup.
  4. The next access will be served locally from the proxy.

To decrease latency the proxy will run on the same server as the browser that do the testing. The proxy will need to handle HTTP/2 because we want to test it as realistic as possible.

In this task we need to check what kind of proxies we can use (is there something out there already) and test them out, so we have a clear picture on how we can move on.

Event Timeline

WebPageTest uses WebPageReplay as a caching proxy (Linux & Mac). The python version doesn't support HTTP/2 but the Chrome team has built a Go version (haven't seen that it is Open Source though) that should handle H2.

Since we have the luxury of only testing against one domain, we could use something like nginx as a reverse proxy with a long cache time on all assets and then find a way to empty the cache (restart?) when the test is finished?

We wanna be able to run the same setup on our local, so spin up a Nginx using Docker in front of Mediawiki would be easy, I could get a working example next week, so we can test how "stable" it is and then move it on to another machine to do some real testing.

The only problem I see is setting connectivity so you can set that locally and it works on localhost. We should create another task for that.

Using that approach we need to be careful with what cache headers we set so we don't override the proxy cache:

But we will see that when we try :)

I had the idea of doing this as simple as possible with ngingx but it didn't work out for me, it wasn't simple to make sure everything was cached because nginx proxy caches depending on headers from origin.

I tried out Web-Page-Replay that the Chrome team uses in there testing, but couldn't get it to work on my local. The thing with WPR is that it misses support for HTTP/2. I'm pretty sure that I've seen somewhere that it should be a new version coming in Go with H2 support but no sign of it yet.

Instead I tested Mahimahi from MIT. First, the current version also only supports H1 but I heard there could come a H2 version later on so we should wait on that. It only supports Ubuntu (at least out of the box) that is kind of a drawback because it would be nice to be able to easily run it on your local machine independent of what you use. I tried it out (I'll do a demo on next performance team meeting) and I really like the shell idea: each tool (Mahimahi comes with both a replay tool and connectivity tools) opens up a new shell. For example you can set latency for x ms and every browser (or other tool) that you run in that shell, will have that latency. It looks like this:

Screen Shot 2017-04-28 at 7.30.33 AM.png (884×1 px, 70 KB)

Practically you do like this:

  • Enter a record shell where you point out the directory where to store the website data (that will be cached)
  • Access the URL that you want to test
  • Exit the shell
  • Go into the replay shell so that you every request hitting that URL will go the the local version
  • Go into a connectivity shell (so you can set the connectivity)
  • Run your tests
  • Exit the shells and remove the cached data

This will work out good for us when it supports H2.

Done some testing with mahimahi, on Ubuntu but in Paralells so not optimal.

In this test I used Chrome, cached , set delay to 200 ms, made 31 runs, removed the cache and did the same again. Then taking the median number and I did this 5 times:

Backendtime1.63s1.63s1.62s1.62s 1.62s
DOMContentLoaded 2.47s2.47s2.46s2.46s2.46s
FirstVisualChange2.68s  2.71s 2.66s 2.65s2.65s
SpeedIndex27682796 275127392739
LastVisualChange6.83s  6.83s 6.78s6.74s6.74s

These tests where HTTP/1 , lets try with HTTP2 when it is available and then report everything in ms and then try on a dedicated machine. Then I can also test Firefox + get the trace log from Chrome (that usually add some numbers to the metrics, but not sure if it will make them unstable).

They actually used WebPageTest when they tested mahimahi at Stanford:

To measure speed index, we create SpeedIndexShell where
we run a private instance of WebPagetest inside ReplayShell.
To automate testing, we use WebPagetest’s
wpt API [18]. Because WebPagetest runs
only on Windows, we run WebPagetest within a VirtualBox
Windows virtual machine, inside ReplayShell.

When I tested I just run

$ mm-webrecord /tmp/test browsertime -n 1 --xvfb --speedIndex
$ mm-webreplay /tmp/test
$ mm-delay 200
$ browsertime -n 31 --xvfb --speedIndex

and first installed browsertime from npm, dependencies for VisualMetrics and xvfb.

I did some testing last week with WebPageReplay. For some reason I didn't get it to work on my local (only Ubuntu). The main drawback with WebPageReplay for us is that is doesn't support H2 and it uses TSProxy for setting the connectivity. I've had problems with TSProxy before (=now working on Mac and with Selenium) and there's an issue at Github about the problem. I've asked which OS that are supported but no answers yet.

The good thing is that I will hopefully get hold of a mahimahi version that supports HTTP/2 later this week. It can be problems though on how the server prio responses (different servers have different prio).

I got access to a h2o version of mahimahi today from Benedikt Wolters, I'll try tomorrow if I can get it to work and see if there's something we can use.

I finally got the the h2o version of mahimahi version up and running, It works but the prio of responses are different. Testing on production using Chrome 58 gives us a waterfall that looks like this:

Screen Shot 2017-05-26 at 9.11.31 AM.png (382×1 px, 111 KB)

  1. HTML
  2. Logo
  3. JS
  4. CSS
  5. CSS


Screen Shot 2017-05-26 at 9.12.03 AM.png (292×1 px, 82 KB)

  1. HTML
  2. CSS
  3. CSS
  4. JS
  5. Sound icon

In Browsertime I miss information about the HTTP Stream as you can get on WebPageTest

dependent.png (678×1 px, 274 KB)
but I guess we could get that info from the trace log in Chrome, I'll look into it.

Benedict uses MIT proxy to get the order of the requests, and then convert the MIT-format to mahimahi. That could be a way for us to get the prio in the right order. But I wonder if it isn't just easier to try to turn on HTTP/2 for Apache. But I'm not sure how it works in production, maybe we need nginx.

@faidon can you help out: What decides the order of the responses on our current setup for HTTP/2 in production? Is it nginx or are nginx just proxy the order from Apache (so nginx talks HTTP/2 with Apache?). It it Apache or nginx that decides the prio of responses?

Status update

There's one more thing I want to try out before we close this and that is to see that the workflow works with MIT Proxy. If it works, we can use it, The next thing is to make another test to see how stable the metrics will be. I think easiest for now is to use SpeedIndex & FirstVisualChange, in the long run we want to find a way to use the Chrome trace log to get more metrics.

The mahimahi solution has some pros and cons.


It is easy to use when you got it installed and it handles all aspects needed (recording the page, setting the connectivity and serving the page). At first tests the metrics looks good and in the technical paper mahimahi scores better metrics than WebPageReplay.


It will only work on desktop, meaning we can't use the same solution if we in the future want to test on real mobile phones. It is Ubuntu only. We want to wait on Benedikt Wolters version to become public before we start to implement it.

Running the same tests a few times with a mitmproxy recording of the enwiki main page replayed with a 200ms delay for each request on mahimahi-h20-h20, which allowed us to test HTTP/2:

[2017-06-21 10:39:26] 28 requests, 265.07 kb, backEndTime: 1.62s (±0.66ms), firstPaint: 2.66s (±18.35ms), firstVisualChange: 2.72s (±19.01ms), DOMContentLoaded: 2.44s (±16.72ms), Load: 5.22s (±36.90ms), speedIndex: 2806 (±19.43), visualComplete85: 2.72s (±19.07ms), lastVisualChange: 4.92s (±42.02ms), rumSpeedIndex: 3202 (±22.48) (31 runs)
[2017-06-21 10:53:28] 28 requests, 265.06 kb, backEndTime: 1.62s (±0.54ms), firstPaint: 2.66s (±21.77ms), firstVisualChange: 2.73s (±22.76ms), DOMContentLoaded: 2.43s (±19.96ms), Load: 5.12s (±48.55ms), speedIndex: 2816 (±22.29), visualComplete85: 2.74s (±22.90ms), lastVisualChange: 4.84s (±46.69ms), rumSpeedIndex: 3178 (±23.76) (31 runs)
[2017-06-21 11:06:26] 28 requests, 265.08 kb, backEndTime: 1.62s (±0.65ms), firstPaint: 2.66s (±14.60ms), firstVisualChange: 2.73s (±15.82ms), DOMContentLoaded: 2.45s (±12.28ms), Load: 5.20s (±42.29ms), speedIndex: 2814 (±15.78), visualComplete85: 2.73s (±15.76ms), lastVisualChange: 4.89s (±47.06ms), rumSpeedIndex: 3206 (±17.10) (31 runs)
[2017-06-21 11:20:00] 28 requests, 265.10 kb, backEndTime: 1.63s (±0.78ms), firstPaint: 2.65s (±19.50ms), firstVisualChange: 2.71s (±20.81ms), DOMContentLoaded: 2.43s (±19.26ms), Load: 5.22s (±33.43ms), speedIndex: 2805 (±20.98), visualComplete85: 2.72s (±20.76ms), lastVisualChange: 4.90s (±36.38ms), rumSpeedIndex: 3213 (±20.81) (31 runs)
[2017-06-21 11:32:26] 28 requests, 265.19 kb, backEndTime: 1.62s (±0.64ms), firstPaint: 2.65s (±21.16ms), firstVisualChange: 2.71s (±21.63ms), DOMContentLoaded: 2.43s (±19.68ms), Load: 5.17s (±42.74ms), speedIndex: 2799 (±21.95), visualComplete85: 2.71s (±21.72ms), lastVisualChange: 4.88s (±46.83ms), rumSpeedIndex: 3189 (±24.20) (31 runs)

Trying it out with 100ms delay and 31 runs as well:

[2017-06-21 12:12:20] 28 requests, 265.12 kb, backEndTime: 824ms (±0.88ms), firstPaint: 1.46s (±10.37ms), firstVisualChange: 1.53s (±10.18ms), DOMContentLoaded: 1.25s (±10.17ms), Load: 2.67s (±13.19ms), speedIndex: 1577 (±10.39), visualComplete85: 1.54s (±10.26ms), lastVisualChange: 2.87s (±20.76ms), rumSpeedIndex: 1719 (±11.14) (31 runs)

I tried doing just 5 runs to see what it was like and the variations (presumably that's the standard deviation in the output) got bigger. Maybe we can do less than 31, though, we'll have to try a bunch of different values to figure it out. Because 31 runs is quite time consuming for the enwiki main page (about 12 minutes total runtime).

Trying once again with the trace log enabled:

[2017-06-21 12:40:52] 28 requests, 265.08 kb, backEndTime: 1.62s (±0.78ms), firstPaint: 2.62s (±24.20ms), firstVisualChange: 2.83s (±25.53ms), DOMContentLoaded: 2.40s (±24.34ms), Load: 5.16s (±41.52ms), speedIndex: 2917 (±25.52), visualComplete85: 2.83s (±25.53ms), lastVisualChange: 5.02s (±41.50ms), rumSpeedIndex: 3158 (±26.22) (31 runs)

Mahimahi, particularly the h20 fork that should be made open sourced soon (currently on a private repo of a researcher who was kind enough to let us get a sneak preview), seems like a great option for this. These initial tests on the enwiki main page show a small standard deviation for SpeedIndex. Before we venture into T133646: Run performance test on commits (Fresnel), this could potentially become a better option than WPT to track key metrics like firstPaint and speedIndex over time to spot regressions in synthetic testing.

Next steps are likely to be trying to run the whole stack on Debian Jessie and seeing if Ops can spare a server we could use for this (maybe an image scaler repurposed after the Thumbor migration?). A dedicated bare metal server would be better, to make the environment more consistent between runs. The above tests were done in a VMWare Ubuntu VM on my Mac, the standard deviations might get even better on bare metal, particularly on a server with many cores, where the apaches, mahimahi, browsertime and ffmpeg can all have cores of their own, to reduce side effects due to the measurements themselves.