Page MenuHomePhabricator

Use Docker version of WebPageTest agents
Closed, DeclinedPublic

Description

One of the problems we have today with WebPageTest is the auto updates. To get rid of those we should use the Docker container and manually update once a month or so.

  1. Try out the Docker container on AWS
  2. Verify how often there will be a new unique tagged Docker container
  3. Verify that the Docker container will not auto update.

Upstream: https://github.com/WPO-Foundation/wptagent/issues/253

Event Timeline

Tested and it works.

sudo docker run -d \
  -e SERVER_URL="http://wpt.wmftest.org/work/" \
  -e LOCATION="useast-docker" \
  -e KEY="SECRET"  \
  --cap-add=NET_ADMIN \
  webpagetest/agent:WebPageTest-17.08

We need to wait on a new tag though, the current version is old and FF isn't working correct (the tracking protection requests).

Screen Shot 2018-04-12 at 10.11.17 AM.png (594×1 px, 184 KB)

Peter changed the task status from Open to Stalled.Apr 13 2018, 6:59 AM
Peter added a project: Upstream.

Waiting on new tags on the Docker hub for WPT.

Peter triaged this task as Medium priority.Apr 13 2018, 7:00 AM
Peter removed a project: Upstream.

There's a new 18-08 release https://hub.docker.com/r/webpagetest/agent/tags/ that we can use. Let me deploy another server on the side so we can test it out.

There's a lot of work going on in the Mozilla team for there WebPageTest setup. Lets wait and see, either they may fix the Docker setup (tagging per browser version) or we can do something together.

I think the way forward is to tag our own containers. Let me fix that.

Since there are no changelong and really rare releases, I think the tag needs to include the build date + Chrome and Firefox version (and then we can add Edge (or what it will be called) version when it's available on Linux.

Peter changed the task status from Stalled to Open.Mar 5 2019, 2:44 PM

I finally got a version up and running on AWS. I've updated the documentation and need to add Docker agent section I will finish that today, and then I think we should make the switch on Monday (then I can also mention it at todays SoS so reading knows before IF there will be any changes to metrics.

On Monday I'll just switch the setting so the new Docker agent work exactly as the current agent.

This is the list so I remember everything when we do the switch:

  1. Deploy a new AWS server and update it
  2. Install Docker
  3. Stop the current agent at AWS (but keep it until we see everything works ok)
  4. Remove the current Docker test setup from location.ini
  5. Turn off auto updates in settings.ini on the WebPageTest server
  6. Start the Docker agent
  7. Setup alerts in AWS for the new server
  8. Verify that the documentation reflects what I did

It is deployed and the tests are running. For Desktop it looks like this (the red line is when I did the switch):

Screen Shot 2019-04-08 at 1.29.23 PM.png (622×1 px, 60 KB)

It looks great, no diff.

For emulate mobile the Docker version seems much faster:

Screen Shot 2019-04-08 at 1.34.19 PM.png (644×1 px, 70 KB)

And comparing after/before:

Screen Shot 2019-04-08 at 1.32.46 PM.png (1×2 px, 455 KB)

It seems like everything got a real boost running inside of the container (but the AWS instance size is the same):

Screen Shot 2019-04-08 at 1.37.03 PM.png (952×1 px, 134 KB)

I enabled the alerts on AWS, everything looks good so far except that the used disk space increased by 6% since yesterday, that needs to be fixed.

We have some problems on the new instance:

Screen Shot 2019-04-10 at 8.58.49 AM.png (206×1 px, 32 KB)

And then when I logged into the instance we had 23000 zombie instances of Chrome.

I'm gonna turn on the old instance again and remove the docker version for now.

I like logs but the logs for WPT is really heard to follow. Here's a snippet from the container:

[0410/070243.350428:ERROR:nacl_helper_linux.cc(310)] NaCl helper process running without a sandbox!
Most likely you need to configure your SUID sandbox correctly
[8334:8334:0410/070243.351372:ERROR:zygote_communication_linux.cc(276)] Failed to send GetTerminationStatus message to zygote
dnsmasq: unrecognized service
rndc: neither /etc/bind/rndc.conf nor /etc/bind/rndc.key was found
sudo: systemd-resolve: command not found
[8869:8869:0410/070244.548486:ERROR:browser_dm_token_storage_linux.cc(101)] Error: /etc/machine-id contains 0 characters (32 were expected).
[8869:8884:0410/070244.627352:ERROR:bus.cc(396)] Failed to connect to the bus: Failed to connect to socket /var/run/dbus/system_bus_socket: No such file or directory

DevTools listening on ws://127.0.0.1:9239/devtools/browser/f3b47366-a319-4524-946f-7c13abac8f47
[8869:8977:0410/070244.721666:ERROR:bus.cc(396)] Failed to connect to the bus: Could not parse server address: Unknown address type (examples of valid types are "tcp" and on UNIX "unix")

(chrome:8869): LIBDBUSMENU-GLIB-WARNING **: 07:02:44.787: Unable to get session bus: Unknown or unsupported transport ?disabled? for address ?disabled:?

I'm having a heard time knowing which errors that is actually error logs that needs attention. But one thing I guess is that I should turn off ipv6.

I went through all the tests I ran before on the Dockerized version and then the Firefox tests worked: http://wpt.wmftest.org/result/190403_4N_CA/

I'v updated the server with max number of files open https://github.com/WPO-Foundation/wptagent/blob/master/docs/install.md#linux and will switch today again and let it run during the weekend.

Ok Firefox worked when I started the container but something happens and now it looks like this again:

Screen Shot 2019-04-12 at 2.12.23 PM.png (328×1 px, 55 KB)

I'll restart but if it continues I'll rollback before the weekend.

I continued to test and added the --init to get the rid of the zombie processes. That worked, but after 24 h (or so) Firefox started to fail again. I have reverted back to the old agent and hope to get some attention upstream.

When we move the agent in-house we will use our own git copy.