Thu, Jan 18
No collect it should be enough!
Adding these for reference later.
I've made the patches needed for separating IE, if they are ok we can go with that now and then setting up the linux agent "should" only be by adding a extra job in the Jenkins configuration file (with the right configuration) and then we can just run them side by side.
Wed, Jan 17
To make it easy we should just move all tests for Internet Explorer to be in separate text files, then we can just add one extra job in Jenkins with different configuration (pointing to the Linux agent) and we are ready to go.
We should do this when we move to Linux and at the same time remove unused keys in Graphite to make it easier.
Been thinking about it, I think the reason to have our own agent is to drive mobile phones.
Yep let the old version just stop to work and make sure we moved to Linux before.
WebPageTest.org is now running on Linux https://twitter.com/patmeenan/status/951234346458984454 so I think we can move on, even though we had the problems. It's better to just run it side by side with the current version.
I think we can do like this: Let the current Windows agent die when the new Chrome version is released and move to Linux a month or two before that, since changing to the new Windows agent will affect metrics like SpeedIndex and FirstVisualChange.
This will not work until AWS has a new Windows version supporting Edge.
Tue, Jan 16
I'll ask him about the difference between when images are finished loading and actually displayed on screen. Like if the difference is X seconds (as we seen before) that should influence the metric.
@Jdlrobson ah cool, l'll do some more testing first see if I can get a better environment on my phone. For the wikis, it would be best if we could test the exact same article.
Fri, Jan 12
Ok I got the WebPageReplay server working with my Alcatel. One thing though, I haven't rooted my device and installed the fake certificate instead I'm using ignore-certificate-errors-spki-list so I get a warning in Chrome and some blocked time on the first domain (400-500ms), however the blocking time are the same for both test so it should probably not matter. But I should root it to make sure, I'll try to do that the week after our team meeting.
Made another sheet on docs @Jhernandez it mostly look the same, have look when you have time. Did 21 runs here too and took the median. I'm gonna try again with WebPageReplay and see if we can comparable first visual change.
I was a little too fast on the trigger, it seems like mapping ports on my phone doesn't work so the traffic goes through the proxy, I need to look into but it could be quite much work to get it working.
Thu, Jan 11
I got the Android 6 phone late today and I've tested a couple of tests and I could actually make it work with WebPageReplay (that is actually pretty cool), I need to do a small hack but I'll be able to make the runs tomorrow to see if we can compare Speed Index and First Visual Change.
Had a quick look through the docs and it seems like you can only access the devices from your local server and not from you AWS server. It still could be usable but not as cool :)
@Jhernandez there's a base there now so you can at least start when you have some time, I'll go through it during the day, try to finish it. Feel free to change/edit things you find are wrong/don't work :)
I'll continue adding docs https://wikitech.wikimedia.org/wiki/Measure_Performance#Testing_performance_on_your_Android_phone during the day.
Wed, Jan 10
Cool, I'll update https://wikitech.wikimedia.org/wiki/Measure_Performance with examples and how to:s, will start tomorrow morning. Today that only includes desktop testing.
I wanna keep my Huawei in the original setup, so I bought another phone in the lower spectrum with 512 mb ram but running Android 6. Will not get it until next week though and then I'm off to the perf team meetup. I can do the rest of the testing the 21 but I think the metrics we have from the Chrome timeline is really good, showing us how the improvement.
Ok, I'm stuck on reversing the traffic on the device using adb reverse ..., my device run Android 4 and reverse was introduced in 5.
Fixed everything for Visual Metrics but since beta isn't cached we get a much higher TTFB for that URL, so comparing first visual change/speed index will not be fair. I can try to get it to work with WebPageReplay but then I need to root my device and install the fake certificate. I had a go at that a couple of weeks ago but didn't get it to work, but can try again.
I started testing yesterday but run into a bug with VisualMetrics, when getting metrics, I'll fix that first and continue later today.
We had it running for a while now and it surely looks better. It's been running on a c4.large (same size as the other tests). We had one bug that I haven't looked into yet. It made us report 0 values (bad), I think the origin of the problem is from the Chromedriver when we get the trace log. The behavior changed a couple of versions ago (2.30) and the hack to fix it was to get the log two times. I'll look into it later this week.
Tue, Jan 9
The numbers looks better, specially on mobile. That should mean if we do it one instance larger, the numbers will look good on desktop too. I'll let it run like this today just to be 100% sure and then update to a larger instance + make a wrapper for devtools timeline and collect median/min/max, that should be doable tomorrow, so we have something with real numbers running till next week, so we can discuss how we can act on it.
I can also reproduce it on the login page:
This will be awesome! I miss docs describing limits and how we do it (but we can add that when it's merged). Like how should we keep track of different oversamplings running, can you run multiple at a time, what should you think about etc.
I got this up and running on a new instance, will update the dashboard asap we got the metrics. If it works, I'll add so we send rendering/scripting etc and see how stable they are.
Ok, I think we only needs to enable devtools.timeline category, then we can generate the following:
Mon, Jan 8
I'll close it since there's no action for us to fix.
Fixed so it cannot happen again (by setting the name) but not sure why it actually happened the first time.
It do not look good on c4.xlarge either (300 ms diff):
I've started this today by deploying to a c4.xlarge machine and pushing to Graphite with the trace log turned out ok. I also started to try out https://github.com/paulirish/devtools-timeline-model to parse the log.
I made a couple of changes how we start the containers, giving them the same name, so this never can happen again + I've seen I've missed mapping the clock, so I added that too. I've updated the docs at Wikitech:
Dec 20 2017
I've added an example of how to also run the WebPageReplay tests locally. Think we are done here for now, let me close the issue.
Dec 19 2017
Turning off the trace log got us back to those great values again 33 ms diff instead off 300 ms.
It doesn't seems it matter so much with that smaller screen with the log turned on, the diff is still 300 ms:
Changed the timespan for all those size graphs to use 3 h instead of average of 24 h (I guess I just copy/pasted that before).
Ok, cool. Let me know when you had time to read through the docs at https://wikitech.wikimedia.org/wiki/Performance/WebPageReplay and tried that you can access it @Gilles and I think we can close this issue.
Dec 18 2017
Adding this so we remember it: We've been testing out with the viewport 1200x960 for WebPageReplay. With our current setup with WebPageTest we use 1024x768.
The setup I was testing was using 1920x1080, default we use 1200x960 for WebPageReplay (and on WebPageTest we use 1024x768).
Ahh I went through the configuration and see that I was running a larger viewport than we do default. Firefox handles that good but not Chrome on AWS. Let me change that and rerun.
Dec 15 2017
I've tested turning on the Chrome trace log on one instance on AWS (c4.large) running with WebPageReplay and the instance is too small:
We are not close to the same stability in first visual change that we usually have. I'lll turn it off for now and check that the metrics looks like it should and then I'll make a quick try on a larger instance to check if it looks better.
The WebPageTest docker containers aren't tagged per release so our old instructions of how to get it up and running isn't working anymore (at least for me), I've added issue upstream https://github.com/WPO-Foundation/webpagetest/issues/1069
Hmm our Docker instructions doesn't work anymore. Note to myself to update them when I get them to work.
Dec 14 2017
Let me start with this tomorrow, would be nice to have it fixed and then start the new year on Linux :)
Next step: We should run it on AWS but not on the automatic deploy. We should create a Linux instance (there's a Ubuntu install script so we should use Ubuntu I think). Then start the agent so it connects to our WebPageTest server (we probably need to reconfigure our setting on the server, I had problem with that the last time I tried). Then we make sure we start the agent with full logging and then locally we can fire 10 tests from https://github.com/wikimedia/wpt-reporter and compare the times and keep the log so we can attach it to the issue at Github. Maybe we can do a PR when we find the problem.
Fixed the last things, now we can just wait 5 days until we have enough data in the alerts.
Dec 12 2017
Dec 11 2017
Here are some screenshot for Facebook:
Dec 9 2017
Dec 8 2017
This is done in Browsertime, see if it can get merge during the weekend and then we can start using that and have better flow for picking up new releases.
@Gilles and me got Firefox working a couple of days ago. Gilles did a hack with dnsmasq: https://github.com/gi11es/browsertime-replays/tree/ff54-dnsmasq/webpagereplay and I got working with the Firefox preferences network.dns.forceResolv it was introduced in Firefox 55, but we was running 54 since the https://github.com/firebug/har-export-trigger was broken in 55. Running in 57 it works (as long as we turn off getting a HAR).
Dec 5 2017
I've reported what you seen in https://bugzilla.mozilla.org/show_bug.cgi?id=712130, great work @Krinkle !
Dec 4 2017
Works ok, let me sync the viewport.
Ok the thing is that the difference is so much lower on WPT but the increase is there for a couple of 100 points, so it is ok. I wonder of we are running different viewports that make the difference so big.
@Gilles no I did without Docker in the initial tests where AWS outperformed everything, however I'm not 100% sure that we got the exact same stable metrics as we got now.
I wonder if we could do something smart here with the Resource Timing API? Or what kind of things do we want to catch? Like if the latency looks the same for all resources there's no need to collect the data?
I wanna move this out of potential goals, I can't see how this will work.
@chasemp I think we need your help here and guide us in the right direction. Let me do a summary:
Nov 30 2017
Cool. I was to dismiss Fargate since to be able to set connectivity with tc you need to do that with Docker networks or set the right privileges on the host machine BUT we just do it on localhost for our tests, so that should work.
I've checked now and values are better than 54 after turning of downloading of tracking protection files. I'll add that to the upstream bug and also again ask about preferred setup.
Nov 29 2017
Looking at p95 surprisingly many 62 has highs when we had few 62 users:
It's not easy to see: