Yes, - browsertime is the correct one, the other one we should remove.
Fri, Dec 14
I think works now, want I want to test out is reverse tethering so the phone can use a limited connection for the server. Or if there's a better way to use a set connectivity. That way we can also use these tests to test without WPR: - we can test full user flow (login etc) or google -> wikipedia.
Thu, Dec 13
I've been running the same tests today (with auto update turned off) and then for Obama changed from 5 to 7 runs and Facebook from 5 to 9 (see the blue vertical line when the change happened). Looks much better but still a spike on one of the phone. That phones runs without a memory card, I wonder if that matters? Let see if I can update that phone too so they look the same + add a couple of more URLs to test. Then I'll look into what kind of difference we can expect.
Wed, Dec 12
I've had this running during the day BUT I missed turning off auto updates and I could see that at least some spikes correlated:
I got this up and running, sending metrics to Graphite/Grafana on a local instance. I'll try to keep running through the day so we have some stats. I test two different URLs (one on each phone) at the same time, 5 runs and then sleep for 2 minutes and run again. I hope we can see how stable it will be. I collect SpeedIndex, first visual change and all CPU metrics. At the moment I just remove all files, but I can set it up to dump on S3 the coming days so we all can look at it.
Tue, Dec 11
Woho, I got this working today. Two phones (both Moto G5), one computer, starting two containers containing WebPageReplay. I made a couple of runs and the metrics looks stable and I couldn't see that one of the phones affected the other one (but I need to look closer).
Mon, Dec 10
Wow this is your post :)
Fri, Dec 7
I've checked the WebPageTest and WebPageReplay dashboards and they look ok. There's some GUI fine-tuning I need to do: The new version always displays the row title (Dashboard Row by default) so I need to edit the dashboards so either remove the title or add a meaningful name, but I'll do that when we done the switch.
Thu, Dec 6
Sat, Dec 1
Wed, Nov 28
[2018-11-28 08:52:14] INFO: 26 requests, 416.83 kb, backEndTime: 939ms (±0.27ms), firstPaint: 1.32s (±0.54ms), firstVisualChange: 1.33s (±0.00ms), DOMContentLoaded: 1.36s (±0.98ms), Load: 1.90s (±1.25ms), speedIndex: 1354 (±0.47), visualComplete85: 1.33s (±0.00ms), lastVisualChange: 3.13s (±7.78ms), rumSpeedIndex: 1566 (±0.82) (3 runs)
Each run takes 1.5 min, I think that is ok for now. The metrics is so stable though so maybe we could do just one run.
I added so we surely read export the setting on reboot in @reboot in the crontab (before we start the job).
Tue, Nov 27
I stopped and started the container. It works now, but I need to follow up to look what went wrong.
This is my setup: On my local machine I run Graphite/Grafana with docker-compose and then I use Ubuntu in Parallels running a simple bash script:
Mon, Nov 26
I think this is done for now, to add more the trace pickup needs to be fixed: https://github.com/sitespeedio/chrome-trace/issues/6
Fri, Nov 23
This worked as before so let us keep that over the weekend + run on desktop.
I'm doing one more test, just enabling the trace logs for the timeline (on mobile):
I reverted back. With the change, timings are reported to different threads, and the script that calculates the time using the main thread didn't fix that. I've seen there been changes in Chrome 70 but we where not affected since we used to use only a high level trace log.
Ooops, this works fine (as in fine for looking at the trace log) but it broke the calculation on a higher level:
To get yargs to work I changed:
A couple of things: Lighthouse seems to skip the toplevel and use disabled-by-default-lighthouse from Chrome 71 (they moved events to that log that they need) to decrease the size of the trace log. According to issues in Lighthouse 10% of the runs of mobile phone on WPT fails because of a huge trace log.
This worked good for mobile testing, I'll enable it for desktop too.
Trying one more thing:
Thu, Nov 22
This looks good for the URLs I checked. Maybe a little increase in last visual change but as long a the metrics are stable it doesn't matter (the vertical line is when I updated):
Interesting, thanks @Gilles for sharing! In Sweden kind the mobile situation is like this: We have main provider Telia Company AB that was previously owned by the public sector, What differs for them than the rest, is that they have support for the whole of Sweden. For example if you are up in the north of Sweden, out in the wilderness that's the only provider that works, so I guess their p-high is higher than the rest, but it's because they are the only one that support far away places. Also (if I remember correctly) Hi3G only support larger cities its interesting that they don't have the best score.
Wed, Nov 21
The metrics are still very stable for mobile so I will enable it for desktop too before I move on and add more categories.
I've pushed that on mobile testing for enwiki and the rest on that server.
It looks like
toplevel,devtools.timeline, disabled-by-default-devtools.timeline, disabled-by-default-devtools.timeline.stack, v8, disabled-by-default-v8.runtime_stat
can be a good first candidate, let me try on enwiki.
Argh it was a mistake by me in the settings. When you set trace categories in Chrome, you should start with "-*" as frist category to clean the old settings, however the CLI handling then disabled the full string. Skipping -* works. I've got pretty good trace with
toplevel,blink,v8,cc,gpu,blink.net,disabled-by-default-v8.runtime_stat, disabled-by-default-devtools.timeline.frame, disabled-by-default-blink.feature_usage, v8.execute, blink.user_timing, blink.console, devtools.timeline, disabled-by-default-devtools.timeline, disabled-by-default-devtools.timeline.stack
Another thing to test is to make sure we can get deeper traces, so it's easier to actually see things. Let me look into that.
Ok, I was really over optimistic about removing dependencies for accessing other domains, that will not work since the whole timeline panel is loaded from https://chrome-devtools-frontend.appspot.com and others. But maybe it's still ok. I will try to get the unpacking to work (the loading and unpacking works locally at the moment except that the result isn't shown :) ).
I tried this yesterday but couldn't get any change. Or rather, I changed to use the default lighthouse, default WepPageTest: We collect more trace logs but we don't get any of value (or more value).
Tue, Nov 20
Mon, Nov 19
There's a lot of work going on in the Mozilla team for there WebPageTest setup. Lets wait and see, either they may fix the Docker setup (tagging per browser version) or we can do something together.
This is what it looks like:
Moved to a larger instance, decreased the variance:
Nov 12 2018
When I look at these kind of things I always first look at screenshots/videos. There's two reasons: Either there's content that do the change (running a banner etc) or it's bug in the tool, for example Chrome could have change changed the size of the small loading bar at the bottom of screen.
Nov 9 2018
Puh, I spent all night getting this to work but no look so far. To get a new version of adb, I've used https://github.com/wjfsanhe/aaddbb_arm_with_boringssl that's a version that works with reverse tethering and newer Chromedrivers. However when I use that version (and other "newer" adbs) it fails earlier then before. The browser doesn't start:
Nov 8 2018
Chromedriver works now (after installing apt-get install libnss3 webp libxss1 libxcursor1 libminizip-dev) but there are still a mismatch between adb and the driver. The browser start on the phone (woho!) but then exit. I'll try to compile a new adb version.
The instructions works so far. I've got what i need installed and adb finds my phone:
I've started following the instructions on https://github.com/WPO-Foundation/webpagetest/blob/master/docs/Private%20Instances/MobileAgentRaspberryPi.md
We had the same spike on AWS but it looks differently:
I had it up and running for some time yesterday until my battery run out of power. The next I will try more runs to see if we can get more stable metrics. There was two things, checkout the spike in First Visual Change that correlates to really low time spent in JS:
The problem at the moment is with Adb. When I run it standalone it starts and I can see that it listens on the correct port for tcp traffic but when I try to start Chrome on the phone I get:
I've been trying to drive my phone from my Raspberry but no luck so far. I've been trying both standalone Chromedriver/adb and everything in a pre-baked Docker container.
Nov 7 2018
One of the problems was that Browsertime automatically tries to download Chromedriver, but since there's no pre-built on ARM it fails. I've added so it just skips installing on ARM and then I'll try to see if I can get a matching version to work.
I had a go trying out to drive the phone from my Raspberry but got stuck on installing Chromedriver (that you need to drive chrome). Chromedriver doesn't support ARM, Docker containers needs to be built upon ARM base containers. I tried that too but then the container can't install the driver.
I got this up and running today. I need to fine-tune the start script a little but more (removing Docker data etc after each run) but I've been able to have it up and running for a couple of hours at least. At the moment we do 5 runs (the blur vertical line is when I turned on devtools.timeline):
Nov 6 2018
Argh I need to write down the fix for this since I spent many hours to try to solve it:
I've been working on to minimize and understand the difference in firstPaint and firstVisualChange when running on Android. Since we use WebPageReplay and I want it to all "just work" I use the Docker container, and that only works on Linux. I run Ubuntu 16 in Parallels on my other laptop and to be able to do changes locally on Ubuntu, I rebuild the container ...
docker build -t XXX ..
But on my machine I couldn't build because it couldn't find any dependency for the container from archive.ubuntu.com. After some searching I found the fix. Change /etc/default/docker and enable the row:
Nov 4 2018
Nov 1 2018
I started this again today. I got a new computer and I can use my old running Ubuntu for a couple of days. My plan is to get it up and running early next week.
Oct 29 2018
Oct 24 2018
They are all updated. Lets have a new go at looking at the metrics.
Yep that was the problem. That vertical line to the right is when I made the change (adding back the start white parameter).
Oct 23 2018
I've deployed a version for enwiki/group 0 etc that uses the --startWhite again to see if we see any difference. Hopefully we do, else I need to dig into the color swapping and that's more complicated (if that's the cause). It probably need to run until tomorrow for us to know for sure.
Hmm. I've been going through what it looks like for different wikis and for enwiki it's looks like 70 introduced more deviation. But looking at the group0 wikis we can see that it's an upgrade of Browsertime that causes more unstable metrics:
Oct 22 2018
Another interesting thing is that Chrome 70 increased deviation between runs:
Oct 19 2018
Oct 18 2018
This is done now, I'll add some graphs later on,
Oct 17 2018
Sometimes I feel I miss a more high level version of the Chrome changelog: https://chromium.googlesource.com/chromium/src/+log/69.0.3497.100..70.0.3538.67?pretty=fuller&n=10000
Oct 12 2018
Moved the last one to c5.large (and updated the docs) just now.
Oct 11 2018
5.3.0 has been released.
I've updated the Firefox server to c5.xlarge and changed ewiki to also run on c5 series (c5.large). That one is also cheaper and a little bit faster.
Oct 10 2018
Oct 8 2018
Enabled them just now. 40 ms/40 points for Speed Index for all tests for now.