Page MenuHomePhabricator

Set up a server running mitmproxy + mahimahi to measure key synthetic testing over time
Closed, DuplicatePublic

Description

The way it would work:

  • mitmdump records the pageload from the live website on the browser. it'll probably be simpler to have browsertime drive the browser for us
  • mitmproxy2mahimahi converts the mitm data into mahimahi format
  • connectivity is blocked? (overkill in theory, based on our tests)
  • mm-webreplay from mahimahi-h2o replays the run without connectivity, with specific simulated network conditions, using browsertime
  • a custom script processes the stats output by browsertime and pipes them to graphite or silimar
  • connectivity is restored?

If the standard deviation is as good as in our early tests, this could replace WPT as the reference for synthetic testing of regressions/alerts.

Event Timeline

Quick setup for browsertime on Debian Stretch (first line is unsafe, check contents of the setup file before running it):

curl -sL https://deb.nodesource.com/setup_6.x | sudo -E bash -
sudo apt-get install chromium xvfb imagemagick ffmpeg pyssim nodejs
sudo npm install browsertime -g

Then this should work:

browsertime --video --speedIndex --xvfb https://en.wikipedia.org

There's currently an issue with the stable version of Chrome (59), but it should fix itself once 60 is released and Stretch picks it up.

mitmproxy2mahimahi depends on mitmproxy 2.x which is based on python3. All the Debian packages for mitmproxy are for ancient versions based on python2.

Which means installing mitmproxy from pip3, along with other dependencies for mitmproxy2mahimahi:

sudo pip3 install mitmproxy
sudo apt-get install python3-protobuf python3-tz

Then it should run with:

mitmdump -s "mitmproxy2mahimahi-master/mitmproxy2mahimahi.py enwiki"

While mitmdump is running, you can record a run accessing the live website, for example using browsertime to drive the browser:

browsertime -n 1 --xvfb --chrome.args proxy-server="127.0.0.1:8080" https://en.wikipedia.org

The next step, compiling and running mahimahi-h2o-h2o.

First it needs to be patched a little:

  • in fcgi/ReplayApp.py a hack to filter out the link header needs to be removed
  • in src/httpserver/h2o_configuration.hh the http2-push-preload option needs to be turned off
  • in src/frontend/web_server.cc the hardcoded reference to /usr/local/bin/h2o needs to be updated to /usr/sbin/h2o

Then install all the required dependencies:

sudo apt-get install dh-autoreconf apache2 dnsmasq protobuf-compiler apache2-dev libprotobuf-dev xcb libxcb-present-dev libpango1.0-dev python-flup

Compile and install it (note the necessary custom command to fix an issue with protobuf):

./autogen.sh
autoreconf -f -i -Wall,no-obsolete
./configure
make
sudo make install

Install h2o:

echo "deb http://dl.bintray.com/tatsushid/h2o-deb stretch-backports main" | sudo tee /etc/apt/sources.list.d/bintray-tatsushid-h2o.list
sudo apt-get update
sudo apt-get install h2o

You need to enable IP forwarding and make the CA certificate from mahimahi trusted:

sudo sysctl -w net.ipv4.ip_forward=1
sudo cp mahimahi-h2o-h2o/mahimahica/cacert.pem /usr/local/share/ca-certificates/mahimahica/cacert.crt
sudo update-ca-certificates

By default debian 9 has nscd handle and cache DNS requests, which can get in the way of the dnsmasq calls used by mm-webreplay. Edit /etc/nscd.conf as root and comment out the lines that mention "hosts". Then restart nscd:

sudo service nscd restart

Finally, this should work, replaying a recording we made earlier with mitmdump with a 100ms delay for every request, and recording a video of the run:

MAHIMAHI_ROOT=/home/gilles/mahimahi-h2o-h2o mm-webreplay enwiki noop mm-delay 100 browsertime -n 1 --xvfb --video --speedIndex https://en.wikipedia.org

However, it doesn't, seemingly because unlike on Ubuntu, the fcgi script runs as root instead of running as the user running mm-webreplay:

root     20772  0.0  0.0 101940  6928 pts/0    Sl   07:38   0:00 /usr/sbin/h2o -c /tmp/replayshell_h2o_config.JRJHKV
root     20774  0.0  0.0 110136  1568 pts/0    Sl   07:38   0:00 /usr/sbin/h2o -c /tmp/replayshell_h2o_config.JRJHKV
root     20777  0.0  0.0  21676  5648 pts/0    S    07:38   0:00 perl -x /usr/share/h2o/kill-on-close --rm /tmp/h2o.fcgisock.DbUaq5 -- /usr/share/h2o/setuidgid gilles /bin/sh -c PUSH_STRATEGY_FILE='noop' REPLAYSERVER_FN='/usr/local/bin/mm-replayserver' WORKING_DIR='/home/gilles' RECORDING_DIR='enwiki/' exec /home/gilles/mahimahi-h2o-h2o/fcgi/FcgiHandler.py

Which displays this error when running mm-webreplay:

Insecure dependency in exec while running setuid at /usr/share/h2o/kill-on-close line 44.

The outcome is that inside the replay session it's possible to connect to the spoofed IP address, but requests aren't answered. I suspect that the perl error means the fcgi script (calling to flup) just doesn't run because of the error, resulting in fcgi requests going into the void and never been responded to.

This issue seems to happen at the h2o level. However the h2o version is identical as the one I used on Ubuntu. It seems to be something due to a library or the environment.

Next step will be to figure out a way to fix that...

After a bunch of investigation, this seems to come from the way mahimahi-h2o-h2o invokes h2o. Running h2o on a similar configuration from a regular user has the fcgi command run under the same user and not root.

It seems like the mahimahi h2o fork goes into a lot of effort about how it invokes h2o, for it to be a child process. It's not a simple run() call like it is in the original mahimahi. I tried running it as run() and ran into a similar perl tainting error as before. I'm guessing that the reason why these great lengths of efforts to run h2o as a child process happened was to avoid that specific problem, In Debian, when the resulting h2o server running inside the replay shell forks to perl, suddenly those perl scripts run as root, which is undesirable. I'm not sure why that's the case and why it's different behavior than in Ubuntu. The author of mahimahi-h2o-h2O might have a better clue than I do.

I suspect that a possible workaround would be for the replay app to manage the flup servers, instead of them be forked by h2o, by using a unix socket for communication rather than a spawn command.

@Gilles do you use Slack, else I can ask for his email if you wanna talk to him directly (think that would be faster than me proxying).

Asked him, I'll introduce asap when I got his email.

Gilles changed the task status from Open to Stalled.Jul 3 2017, 2:01 PM

Trying again, this time on Ubuntu 14.04 on Labs.

All dependencies:

sudo add-apt-repository ppa:fkrull/deadsnakes
curl -sL https://deb.nodesource.com/setup_6.x | sudo -E bash -
printf 'Package: *\nPin: release o=Node Source\nPin-Priority: 1002\n' | sudo tee /etc/apt/preferences.d/node.pref
sudo apt-get update
sudo apt-get install dh-autoreconf apache2 dnsmasq protobuf-compiler apache2-dev libprotobuf-dev xcb libxcb-present-dev libpango1.0-dev python-flup build-essential python3-pip python3-dev xvfb imagemagick ffmpeg python-3.5 python3.5-dev libffi-dev xfonts-100dpi xfonts-75dpi xfonts-cyrillic xfonts-scalable xfonts-intl-european
sudo curl https://bootstrap.pypa.io/ez_setup.py -o - | sudo python3.5
sudo easy_install pip
sudo pip3 install --upgrade pip
sudo pip3 install mitmproxy protobuf pytz
sudo pip install pyssim
sudo npm install browsertime -g

Trusty has OpenSSL 1.0.1 and mahimahi-h2o depends on OpenSSL >= 1.0.2

We can build and install OpenSSL 1.0.2 from source:

curl https://www.openssl.org/source/openssl-1.0.2g.tar.gz | tar xz
cd openssl-1.0.2g/
./config
make depend
make
sudo make install

Then we need to build and link mahimahi-h2o against it:

./autogen.sh
./configure CXXFLAGS='-g -O2 -I/usr/local/ssl/include' LDFLAGS='-g -L/usr/local/ssl/lib' LIBS='-ldl'
make
sudo make install

Install Chrome via the Google PPA:

wget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub | sudo apt-key add -
sudo sh -c 'echo "deb http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google-chrome.list'
sudo apt-get update
sudo apt-get install google-chrome-beta

Still doesn't work because of the same kill-on-close perl tainting.

Found a way to circumvent the privilege checks to make it work.

kill-on-close from h2o needs to have the variables passed to exec "washed" this way:

if ($pid == 0) {
    my @cmd = ();
    for my $el (@ARGV) {
        my $clean = "$el";
        ($clean) = $clean =~ /^(.*)$/;
        push @cmd, $clean
    }
    exec @cmd;
    die "failed to exec $ARGV[0]:$!";
}

And probably the same for the exec call to rm later in that file.

And in mahimahi-h2o, the exception thrown in assert_not_root needs to be commented out:

void assert_not_root( void )
{
    if ( ( geteuid() == 0 ) or ( getegid() == 0 ) ) {
        // throw runtime_error( "BUG: privileges not dropped in sensitive region" );
    }
}

This all means that the mahimahi machine on labs *might* be subject to privilege escalation, but that shouldn't matter because it's not public-facing. Hopefully the maintainer of mahimahi-h2o will come up with a fix for the issue (he doesn't have time right now, wrapping up his thesis).

I suspect it has something to do with the shell that's forked. The version of bash (which I presume is the default) probably differs between these different versions of linux.

The definite guide for Debian 9:

wget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub | sudo apt-key add -
sudo sh -c 'echo "deb http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google-chrome.list'
echo "deb http://dl.bintray.com/tatsushid/h2o-deb stretch-backports main" | sudo tee /etc/apt/sources.list.d/bintray-tatsushid-h2o.list
curl -sL https://deb.nodesource.com/setup_6.x | sudo -E bash -
sudo apt-get install google-chrome-beta xvfb imagemagick ffmpeg pyssim nodejs dh-autoreconf apache2 dnsmasq protobuf-compiler apache2-dev libprotobuf-dev xcb libxcb-present-dev libpango1.0-dev python-flup h2o python3-protobuf python3-tz python3-pip python3-setuptools python3-dev libssl-dev unzip build-essential
sudo npm install browsertime -g
sudo pip3 install --upgrade pip
sudo pip3 install mitmproxy
sudo sysctl -w net.ipv4.ip_forward=1
unzip mitmproxy2mahimahi-master.zip

Then in 2 separate shells, to record the run:

mitmdump -s "mitmproxy2mahimahi-master/mitmproxy2mahimahi.py enwiki"
browsertime -n 1 --xvfb --chrome.args proxy-server="127.0.0.1:8080" https://en.wikipedia.org

Edit /etc/nscd.conf as root and comment out the lines that mention "hosts", then restart nscd.
Edit /usr/share/h2o/kill-on-close as root and make the variable washing described above.
Unzip the mahimahi-h2o sources, make the changes described above. Make it and install it:

./autogen.sh
autoreconf -f -i -Wall,no-obsolete
./configure
make
sudo make install

And finally this should work to replay the run:

MAHIMAHI_ROOT=/home/gilles/mahimahi-h2o-h2o mm-webreplay enwiki noop mm-delay 100 browsertime -n 1 --xvfb --video --speedIndex https://en.wikipedia.org

I've tested the new version of WebPageReplay:

I could get it to work with Chrome, replaying https Wikipedia, setting latency on Linux (Ubuntu 16). I almost got the same working on my Mac (the problem was that I didn't want to install the fake certificate). I did a test with Browsertime.

What's good about it is that it is really simple to get to work, what's bad is that I couldn't get it to work for Firefox, but maybe that is something we can fix in the future with help from Mozilla?

I reached out to Benedikt about releasing his modified mahimahi but no luck so far.

I wouldn't mind implement support for WebPageReplay directly into Browsertime instead of running it from outside: We have a concept of pre/post tasks that runs before/after the URL gets tested, so it could be that it would be easy. Then we just install WebPageReplay/Browsertime/dependencies and can just use the cli commands for Browsertime.

Setting latency on my localhost host I used:

## Add 100 ms latency on your localhost
sudo tc qdisc add dev lo root handle 1:0 netem delay 100ms

## Remove the latency
sudo tc qdisc del dev lo root

So ... Benedikt reach out today and he will publish the code tomorrow. That means we could move on with either WebPageReplay or mahimahi.

@Peter I'm having an issue with browsertime on Debian Stretch, are you familiar with that error?

[2017-09-29 12:22:50] Error running browsertime BrowserError: disconnected: unable to connect to renderer
  (Session info: chrome=62.0.3202.38)
  (Driver info: chromedriver=2.32.498513 (2c63aa53b2c658de596ed550eb5267ec5967b351),platform=Linux 4.9.0-3-amd64 x86_64)
  (Session info: chrome=62.0.3202.38)
  (Driver info: chromedriver=2.32.498513 (2c63aa53b2c658de596ed550eb5267ec5967b351),platform=Linux 4.9.0-3-amd64 x86_64)
    at BrowsertimeError (/home/gilles/.nvm/versions/node/v6.11.3/lib/node_modules/browsertime/lib/support/errors.js:5:5)
    at BrowserError (/home/gilles/.nvm/versions/node/v6.11.3/lib/node_modules/browsertime/lib/support/errors.js:13:5)
    at startBrowser.call.catch.tap.tap.catch.catch.e (/home/gilles/.nvm/versions/node/v6.11.3/lib/node_modules/browsertime/lib/core/seleniumRunner.js:92:15)
From previous event:
    at SeleniumRunner.start (/home/gilles/.nvm/versions/node/v6.11.3/lib/node_modules/browsertime/lib/core/seleniumRunner.js:91:13)
    at Promise.resolve.tap (/home/gilles/.nvm/versions/node/v6.11.3/lib/node_modules/browsertime/lib/core/engine.js:312:27)
From previous event:
    at runIteration (/home/gilles/.nvm/versions/node/v6.11.3/lib/node_modules/browsertime/lib/core/engine.js:312:10)
    at Promise.reduce (/home/gilles/.nvm/versions/node/v6.11.3/lib/node_modules/browsertime/lib/core/engine.js:409:27)
From previous event:
    at Promise.resolve.tap.tap.tap.tap.result (/home/gilles/.nvm/versions/node/v6.11.3/lib/node_modules/browsertime/lib/core/engine.js:406:17)
    at runCallback (timers.js:672:20)
    at tryOnImmediate (timers.js:645:5)
    at processImmediate [as _immediateCallback] (timers.js:617:5)
From previous event:
    at Engine.run (/home/gilles/.nvm/versions/node/v6.11.3/lib/node_modules/browsertime/lib/core/engine.js:405:8)
    at /home/gilles/.nvm/versions/node/v6.11.3/lib/node_modules/browsertime/bin/browsertime.js:63:21
From previous event:
    at run (/home/gilles/.nvm/versions/node/v6.11.3/lib/node_modules/browsertime/bin/browsertime.js:62:6)
    at Object.<anonymous> (/home/gilles/.nvm/versions/node/v6.11.3/lib/node_modules/browsertime/bin/browsertime.js:137:1)
    at Module._compile (module.js:570:32)
    at Object.Module._extensions..js (module.js:579:10)
    at Module.load (module.js:487:32)
    at tryModuleLoad (module.js:446:12)
    at Function.Module._load (module.js:438:3)
    at Module.runMain (module.js:604:10)
    at run (bootstrap_node.js:389:7)
    at startup (bootstrap_node.js:149:9)
    at bootstrap_node.js:502:3

N/m, switching back to Chrome stable fixed it.