Did a quick check and it loads a lot of images. I think you are perfectly right @Gilles that we should lazy load images on Desktop too. http://wpt.wmftest.org/result/180419_RD_DR/4/details/#waterfall_view_step1
I've switched to the one with most stable metrics, closed down two of the others and keep the first server a couple of hours just to make sure everything works ok.
For emulated mobile the numbers are more close to each other. That instance isn't always the best but difference is really small:
We have 3 days of metrics now that should be enough. I can actually see difference in stability. Let me first do a summary:
I created T192522 as a follow up when we collected the yearly stats from the old machine. I think we can say this task is done and the do the cleanup after we collected the metrics in the end of the quarter.
Tue, Apr 17
Thanks @Krinkle for merging. I added a panel at the bottom of the drilldown as a first step:
Mon, Apr 16
https://grafana.wikimedia.org/dashboard/db/webpagereplay-multiple-instances will have more later on.
I've installed to extra servers so it's easier for us to keep track of difference in the metrics, I'll setup a new dashboard for that when we have more data.
Yep, I asked before for the Windows version, but let me ask again for the agent. I've been wanting a git version, product version and a changelog for a while.
@Gilles no I didn't touch it the 30th as I remember. I try to add an annotation for every change I do so it's easier to remember. But checking the commit log for WPTagent it was a change: https://github.com/WPO-Foundation/wptagent/commit/217c36fea9fb534b570421e19e0f3148186bf2fe
with the fully loaded. It looks like we missed requests before, so the new way is correct.
Thanks @Imarlier almost no tuning done so far so there are a lot to do there. But I wonder then if it wouldn't be better to try to tune bare-metal instead with some help? We did get the most stable metrics for AWS but if we will spend time to tune it more, maybe it's better to do it on our own servers instead? I mean the exact metrics doesn't matter for us, what's important is that the metrics are stable and we get that with more control?
Fri, Apr 13
It adds metrics under browertime.enwiki-test, it looks the same for FF, I'll check tonight or tomorrow to see if there's a difference.
The old instance runs 4.4.0-1054-aws and the new one I installed runs 4.4.0-1052-aws (older version). So far when I checked the new one the metrics are back as before (checking CPU time spent in Chrome). But it will be easier to see when we got more data.
Thanks I updated the docs.
I talked to a friend that runs everything on AWS and he said that this kind of things happens. I got the other instance up and running (it took some time), lets see what those metrics looks like.
I have created a new instance and let it send metrics to another key structure for now, need to keep it running for a couple of hours to know more.
The CPU usage seems higher on the instance after the stop:
I got a feeling there's something going on on the server. I'll create a new one during the day and deploy there. Also will change so we log to another dir and start on server restart.
Waiting on new tags on the Docker hub for WPT.
Thu, Apr 12
Tested and it works.
Mon, Apr 9
fixed earlier this week.
Fri, Apr 6
Pat Meenan told me he could setup a Moto G4 for our private use to test out, that would be a good idea I think.
Got answer that the only way to fix it automatically is to use the auto scaling (that isn't working). It makes wonder, maybe we should run the test differently: The linux machines spins up much faster that Windows, maybe we should burst the tests (send them all at once), scale up a couple of machines and then kill them when they are finished.
Good, let me try that. I've added to Browsertime now and adding it to the list for WebPageTest too.
@Gilles great! Do you wanna contact Mozilla about it? We don't have see the same variance for Chrome.
Thu, Apr 5
I killed the instances, recreated new ones with the new AMI ids and it works (we have a big queue though).
Trying the simplest solution by disabling calculating the agent in the settings.ini:
I could change configuration to make this go away (by letting the Visual Metrics pickup the viewport). It will go out the next time we do an update.
There's a couple of things going on here:
This is another good example with a lot of variance for Firefox (https://en.wikipedia.org/wiki/Cephalopod_size): https://www.webpagetest.org/result/180405_76_1c5de8b142ba1ed466b268022c724722/
Yep we could do that, we need to find a article though that gives stable metrics in Firefox. Did a quick test run https://www.webpagetest.org/result/180405_76_1c5de8b142ba1ed466b268022c724722/ - first visual change goes from 1.5 s to 2.4 s :( It's a good article with more feedback to Mozilla though.
Tue, Apr 3
I've disabled the Firefox alerts for 7 days since turning on the MOZ log increased the metrics and if the alerts are fired, it's a false alert.
Deployed earlier today, will watch the alerts and graphs the coming days and make some adjustments. All Chrome Visual Metrics will be faster with this release because of two things: Browsertime now stores trace logs/screenshots between runs, before they where stored after all runs. The navigation on when to start a test also has been finetuned: Before it could be latency between swithing the background to white and navigating the browser to the URL (it happend in two different webdriver commands) but now it is the one and the same command.
The dashboard is updated: https://grafana.wikimedia.org/dashboard/db/wikidata-webpagetest
FYI: I'm changing the dashboard today. The Windows agent went crazy the 31/3 and metrics are too high.
I've updated the alerts now so it uses Linux. Looks good now. The Windows agent is still broken though.
The alerts is still using the Windows machine, the rest of our is using Linux. It seems like the Windows machine went crazy, look at this:
Thu, Mar 29
I talked with @Imarlier yesterday and the way forward is to just decide the metrics to start with, and then we can continue to redefine them. I'll make a first proposal and you all @Krinkle @Gilles @aaron @Imarlier can edit/change make/suggestions:
Wed, Mar 28
@Gilles I want to try it out, what's the easiest way to do it?
Tue, Mar 27
I've pushed the tests in the crontab on the WebPageTest agent too. We could add/change the URLs but let us do that in another task.
Mon, Mar 26
I've tried out different preferences, and stopped more type of requests in the background (setting network.captive-portal-service.enabled to false). Haven't been able to track down more requests in the MOZ-log so it is probably ok for now.
Fri, Mar 23
This is driving me crazy :( I've verified that Firefox is using the proxy (by removing internet when replying the data), it works. However both using the proxy and doing run without the proxy the Firefox metrics has high variance: Running Chrome without setting any connectivity, not using the proxy the variance is 100 ms, the same and less when we use the proxy. For Firefox the variance is 1000 ms of firstVisualChange both with or without the proxy. I wonder if there's something in the Docker setup that can explain it? When I check the HAR file and compare with a run with the same connectivity, the HAR is mostly identical except that the timings differs 1000ms (so the HAR doesn't help). I'll test if I see the same without Docker and also collect the MOZ log but I think it is a trace log of what Firefox is doing that's missing. So far I tried to rollback to stable 58, disable ipv6 , increased the file limit on Linux but no luck so far.
Thu, Mar 22
I've done a test by switching to the same Firefox preferences that WebPageTest uses instead of the default one that Geckodriver uses (+ some extras) and the median SpeedIndex looks a lot better:
Wed, Mar 21
Hmm been checking individual runs and for Firefox when we do 11 runs the variance can be 500 ms (for chrome it is 100 ms) between runs so I wonder maybe it is some configuration issue? We turned off OCSP and black/white list update but maybe it's something more that needs to be disabled.
Hmm something is wrong with the navtiming2, @Krinkle and me looked it through last night and I cannot make the numbers match? The idea with navtiming2 is that we should either we should send all metrics or no metrics at all.
Mar 20 2018
Here's another example. These changes made First Visual Change go down 400 ms.
Mar 19 2018
Looks good @Krinkle ! When we moved to Linux and collected the "final" metrics for Windows we can do a major cleanup again.
Yep it seems to have been changed in WebPageTest:
Hmm have this changed upstream? We only have last interactive (again) or is it a statsv thing? First Interactive and Time to interactive metrics is gone.
This works on the test server (where we test against the Swedish URLs), so we can update this on the server that tests for the alerts when we do next update.
Should be ok now, but let us keep this open until the next run we are 100% sure.
Thanks for seeing it @Gilles! I'm disabling the failing URLs for now.
That connection problem was only temporary (if you haven't fixed anything @Gilles?). The problem is that https://www.wikipedia.beta.wmflabs.org/ isn't redirected anymore (the portal team test that page):
Mar 14 2018
It turns out that Firefox in this case of preferences handles 0, "0", and false differently. It's only the actual integer 0 that turns off the OSCP, so I need to do a code change to fix that and then build a new Docker container and deploy.
This is fixed by disabling OSCP ('security.OCSP.enabled': 0). I'll make the changes later today and verify that it works correct on the server (it is worked when I tried it on my local machine).
Adding some results. Running Firefox with 11 runs and the HAR plugin turned on:
Mar 13 2018
I think we can push RUMSpeedIndex , what I would also like if we could pick it up in WebPageTest. I think we can just add a global JS on the WebPageTest server so it will always test it? I can have a go later today.