Page MenuHomePhabricator

[Research] Proxy for local performance testing
Closed, ResolvedPublic

Description

To be able to do T133646 (test front end performance on commits) we need to have a proxy to cache responses to have as little disturbance in the metrics as possible. Both Facebook and the Chrome team uses that approach when measuring front end performance.

How it will work

  1. Access a URL through the proxy.
  2. DNS and assets are loaded from Internet.
  3. The proxy caches all the assets and the DNS lookup.
  4. The next access will be served locally from the proxy.

To decrease latency the proxy will run on the same server as the browser that do the testing. The proxy will need to handle HTTP/2 because we want to test it as realistic as possible.

In this task we need to check what kind of proxies we can use (is there something out there already) and test them out, so we have a clear picture on how we can move on.

Event Timeline

Peter created this task.Dec 15 2016, 9:26 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptDec 15 2016, 9:26 PM
Peter updated the task description. (Show Details)Dec 16 2016, 8:51 AM
Peter added a comment.Dec 16 2016, 8:55 AM

WebPageTest uses WebPageReplay as a caching proxy (Linux & Mac). The python version doesn't support HTTP/2 but the Chrome team has built a Go version (haven't seen that it is Open Source though) that should handle H2.

Peter claimed this task.Apr 21 2017, 1:03 PM
Peter moved this task from Backlog: Small & Maintenance to Doing on the Performance-Team board.
Peter added a comment.Apr 21 2017, 1:32 PM

Since we have the luxury of only testing against one domain, we could use something like nginx as a reverse proxy with a long cache time on all assets and then find a way to empty the cache (restart?) when the test is finished?

We wanna be able to run the same setup on our local, so spin up a Nginx using Docker in front of Mediawiki would be easy, I could get a working example next week, so we can test how "stable" it is and then move it on to another machine to do some real testing.

The only problem I see is setting connectivity so you can set that locally and it works on localhost. We should create another task for that.

Peter added a comment.Apr 21 2017, 1:50 PM

Using that approach we need to be careful with what cache headers we set so we don't override the proxy cache: http://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_cache_valid

But we will see that when we try :)

Peter added a comment.Apr 28 2017, 5:36 AM

I had the idea of doing this as simple as possible with ngingx but it didn't work out for me, it wasn't simple to make sure everything was cached because nginx proxy caches depending on headers from origin.

I tried out Web-Page-Replay that the Chrome team uses in there testing, but couldn't get it to work on my local. The thing with WPR is that it misses support for HTTP/2. I'm pretty sure that I've seen somewhere that it should be a new version coming in Go with H2 support but no sign of it yet.

Instead I tested Mahimahi from MIT. First, the current version also only supports H1 but I heard there could come a H2 version later on so we should wait on that. It only supports Ubuntu (at least out of the box) that is kind of a drawback because it would be nice to be able to easily run it on your local machine independent of what you use. I tried it out (I'll do a demo on next performance team meeting) and I really like the shell idea: each tool (Mahimahi comes with both a replay tool and connectivity tools) opens up a new shell. For example you can set latency for x ms and every browser (or other tool) that you run in that shell, will have that latency. It looks like this:

Practically you do like this:

  • Enter a record shell where you point out the directory where to store the website data (that will be cached)
  • Access the URL that you want to test
  • Exit the shell
  • Go into the replay shell so that you every request hitting that URL will go the the local version
  • Go into a connectivity shell (so you can set the connectivity)
  • Run your tests
  • Exit the shells and remove the cached data

This will work out good for us when it supports H2.

Peter added a comment.May 4 2017, 9:22 AM

Done some testing with mahimahi, on Ubuntu but in Paralells so not optimal.

In this test I used Chrome, cached https://en.wikipedia.org/wiki/Main_Page , set delay to 200 ms, made 31 runs, removed the cache and did the same again. Then taking the median number and I did this 5 times:

Backendtime1.63s1.63s1.62s1.62s 1.62s
DOMContentLoaded 2.47s2.47s2.46s2.46s2.46s
FirstVisualChange2.68s  2.71s 2.66s 2.65s2.65s
SpeedIndex27682796 275127392739
LastVisualChange6.83s  6.83s 6.78s6.74s6.74s

These tests where HTTP/1 , lets try with HTTP2 when it is available and then report everything in ms and then try on a dedicated machine. Then I can also test Firefox + get the trace log from Chrome (that usually add some numbers to the metrics, but not sure if it will make them unstable).

Peter added a comment.May 4 2017, 2:07 PM

They actually used WebPageTest when they tested mahimahi at Stanford: http://mahimahi.mit.edu/mahimahi_atc.pdf

To measure speed index, we create SpeedIndexShell where
we run a private instance of WebPagetest inside ReplayShell.
To automate testing, we use WebPagetest’s
wpt batch.py API [18]. Because WebPagetest runs
only on Windows, we run WebPagetest within a VirtualBox
Windows virtual machine, inside ReplayShell.

When I tested I just run

$ mm-webrecord /tmp/test browsertime https://en.wikipedia.org/wiki/Main_Page -n 1 --xvfb --speedIndex
$ mm-webreplay /tmp/test
$ mm-delay 200
$ browsertime https://en.wikipedia.org/wiki/Main_Page -n 31 --xvfb --speedIndex

and first installed browsertime from npm, dependencies for VisualMetrics and xvfb.

Peter added a comment.May 23 2017, 9:54 AM

I did some testing last week with WebPageReplay. For some reason I didn't get it to work on my local (only Ubuntu). The main drawback with WebPageReplay for us is that is doesn't support H2 and it uses TSProxy for setting the connectivity. I've had problems with TSProxy before (=now working on Mac and with Selenium) and there's an issue at Github about the problem. I've asked which OS that are supported but no answers yet.

The good thing is that I will hopefully get hold of a mahimahi version that supports HTTP/2 later this week. It can be problems though on how the server prio responses (different servers have different prio).

Peter added a comment.May 23 2017, 6:55 PM

I got access to a h2o version of mahimahi today from Benedikt Wolters, I'll try tomorrow if I can get it to work and see if there's something we can use.

Peter added a subscriber: faidon.May 26 2017, 7:48 AM

I finally got the the h2o version of mahimahi version up and running, It works but the prio of responses are different. Testing https://en.wikipedia.org/wiki/Barack_Obama on production using Chrome 58 gives us a waterfall that looks like this:

  1. HTML
  2. Logo
  3. JS
  4. CSS
  5. CSS

mahimahi-h2o

  1. HTML
  2. CSS
  3. CSS
  4. JS
  5. Sound icon

In Browsertime I miss information about the HTTP Stream as you can get on WebPageTest

but I guess we could get that info from the trace log in Chrome, I'll look into it.

Benedict uses MIT proxy to get the order of the requests, and then convert the MIT-format to mahimahi. That could be a way for us to get the prio in the right order. But I wonder if it isn't just easier to try to turn on HTTP/2 for Apache. But I'm not sure how it works in production, maybe we need nginx.

@faidon can you help out: What decides the order of the responses on our current setup for HTTP/2 in production? Is it nginx or are nginx just proxy the order from Apache (so nginx talks HTTP/2 with Apache?). It it Apache or nginx that decides the prio of responses?

Status update

There's one more thing I want to try out before we close this and that is to see that the workflow works with MIT Proxy. If it works, we can use it, The next thing is to make another test to see how stable the metrics will be. I think easiest for now is to use SpeedIndex & FirstVisualChange, in the long run we want to find a way to use the Chrome trace log to get more metrics.

The mahimahi solution has some pros and cons.

Pros

It is easy to use when you got it installed and it handles all aspects needed (recording the page, setting the connectivity and serving the page). At first tests the metrics looks good and in the technical paper mahimahi scores better metrics than WebPageReplay.

Cons

It will only work on desktop, meaning we can't use the same solution if we in the future want to test on real mobile phones. It is Ubuntu only. We want to wait on Benedikt Wolters version to become public before we start to implement it.

Gilles added a subscriber: Gilles.Jun 21 2017, 10:12 AM

Running the same tests a few times with a mitmproxy recording of the enwiki main page replayed with a 200ms delay for each request on mahimahi-h20-h20, which allowed us to test HTTP/2:

[2017-06-21 10:39:26] 28 requests, 265.07 kb, backEndTime: 1.62s (±0.66ms), firstPaint: 2.66s (±18.35ms), firstVisualChange: 2.72s (±19.01ms), DOMContentLoaded: 2.44s (±16.72ms), Load: 5.22s (±36.90ms), speedIndex: 2806 (±19.43), visualComplete85: 2.72s (±19.07ms), lastVisualChange: 4.92s (±42.02ms), rumSpeedIndex: 3202 (±22.48) (31 runs)
[2017-06-21 10:53:28] 28 requests, 265.06 kb, backEndTime: 1.62s (±0.54ms), firstPaint: 2.66s (±21.77ms), firstVisualChange: 2.73s (±22.76ms), DOMContentLoaded: 2.43s (±19.96ms), Load: 5.12s (±48.55ms), speedIndex: 2816 (±22.29), visualComplete85: 2.74s (±22.90ms), lastVisualChange: 4.84s (±46.69ms), rumSpeedIndex: 3178 (±23.76) (31 runs)
[2017-06-21 11:06:26] 28 requests, 265.08 kb, backEndTime: 1.62s (±0.65ms), firstPaint: 2.66s (±14.60ms), firstVisualChange: 2.73s (±15.82ms), DOMContentLoaded: 2.45s (±12.28ms), Load: 5.20s (±42.29ms), speedIndex: 2814 (±15.78), visualComplete85: 2.73s (±15.76ms), lastVisualChange: 4.89s (±47.06ms), rumSpeedIndex: 3206 (±17.10) (31 runs)
[2017-06-21 11:20:00] 28 requests, 265.10 kb, backEndTime: 1.63s (±0.78ms), firstPaint: 2.65s (±19.50ms), firstVisualChange: 2.71s (±20.81ms), DOMContentLoaded: 2.43s (±19.26ms), Load: 5.22s (±33.43ms), speedIndex: 2805 (±20.98), visualComplete85: 2.72s (±20.76ms), lastVisualChange: 4.90s (±36.38ms), rumSpeedIndex: 3213 (±20.81) (31 runs)
[2017-06-21 11:32:26] 28 requests, 265.19 kb, backEndTime: 1.62s (±0.64ms), firstPaint: 2.65s (±21.16ms), firstVisualChange: 2.71s (±21.63ms), DOMContentLoaded: 2.43s (±19.68ms), Load: 5.17s (±42.74ms), speedIndex: 2799 (±21.95), visualComplete85: 2.71s (±21.72ms), lastVisualChange: 4.88s (±46.83ms), rumSpeedIndex: 3189 (±24.20) (31 runs)

Trying it out with 100ms delay and 31 runs as well:

[2017-06-21 12:12:20] 28 requests, 265.12 kb, backEndTime: 824ms (±0.88ms), firstPaint: 1.46s (±10.37ms), firstVisualChange: 1.53s (±10.18ms), DOMContentLoaded: 1.25s (±10.17ms), Load: 2.67s (±13.19ms), speedIndex: 1577 (±10.39), visualComplete85: 1.54s (±10.26ms), lastVisualChange: 2.87s (±20.76ms), rumSpeedIndex: 1719 (±11.14) (31 runs)

I tried doing just 5 runs to see what it was like and the variations (presumably that's the standard deviation in the output) got bigger. Maybe we can do less than 31, though, we'll have to try a bunch of different values to figure it out. Because 31 runs is quite time consuming for the enwiki main page (about 12 minutes total runtime).

Trying once again with the trace log enabled:

[2017-06-21 12:40:52] 28 requests, 265.08 kb, backEndTime: 1.62s (±0.78ms), firstPaint: 2.62s (±24.20ms), firstVisualChange: 2.83s (±25.53ms), DOMContentLoaded: 2.40s (±24.34ms), Load: 5.16s (±41.52ms), speedIndex: 2917 (±25.52), visualComplete85: 2.83s (±25.53ms), lastVisualChange: 5.02s (±41.50ms), rumSpeedIndex: 3158 (±26.22) (31 runs)
Gilles closed this task as Resolved.Jun 21 2017, 1:24 PM

Mahimahi, particularly the h20 fork that should be made open sourced soon (currently on a private repo of a researcher who was kind enough to let us get a sneak preview), seems like a great option for this. These initial tests on the enwiki main page show a small standard deviation for SpeedIndex. Before we venture into T133646: Run performance test on commits (Fresnel), this could potentially become a better option than WPT to track key metrics like firstPaint and speedIndex over time to spot regressions in synthetic testing.

Next steps are likely to be trying to run the whole stack on Debian Jessie and seeing if Ops can spare a server we could use for this (maybe an image scaler repurposed after the Thumbor migration?). A dedicated bare metal server would be better, to make the environment more consistent between runs. The above tests were done in a VMWare Ubuntu VM on my Mac, the standard deviations might get even better on bare metal, particularly on a server with many cores, where the apaches, mahimahi, browsertime and ffmpeg can all have cores of their own, to reduce side effects due to the measurements themselves.