Wed, Jul 5
Testing then could be:
When we get this working we can add one location and let it run there for a while so we see that it is ok, I haven't been testing it running for some time, only made a couple of shots.
Tue, Jul 4
We still have the issue with the TTFB: http://wpt.wmftest.org/result/170704_29_B3/1/details/#waterfall_view_step1
I've updated the Github thread.
The metrics has now gone back to almost the same level for the Facebook page:
I've increased the alert to only alert on 40% or more change, let us keep it like that for a while (before we used 20%).
Mon, Jul 3
I think like this: This doesn't work out of the box and it's a real hack to get it to work for Windows. In the new Linux version we probably just update the agent our self and deploy it on a server (with a static IP), so then it will not be an issue (except for updating Chrome/Firefox but let us focus on that later.
Had another try (it is fixed in the WPT code) but the AWS instance isn't updated with the latest (I still get the same error on a new instance) so I guess the new version isn't auto updated.
The new mobile versions looks like:
I think we should skip "List the latest blog posts from the performance team. (Create sidebar?" - I've added the link to the blog in the header. We need to redesign the page to include the posts, but lest do that when we have more posts.
We tested out Catchpoint to see if the SpeedIndex there where usable, setting connectivity to 4g, running one test every hour:
Fri, Jun 30
The new one looks like this:
Thu, Jun 29
Made another run with the change for WebPageTest (using Linux version, because there's something going on where I cannot get through a test on the normal Dulles instances):
I've changed all timing metrics. I could see that after the change some metrics are above the limit but since not all three, the alert is not fired.
And you can use the include relative: https://jekyllrb.com/docs/includes/ that will work just fine :)
I think we don't need to uncss the metrics-graphics for now, zipping everything we get under the magic 14kb, lets try to just include them.
Wed, Jun 28
I had some test last weekend and just tested adding it to my own AWS instance in Mumbai. I added two URLs, testing on desktop (cable) and two on mobile (3g and 3gem).
Tue, Jun 27
Mon, Jun 26
Asked him, I'll introduce asap when I got his email.
@Gilles do you use Slack, else I can ask for his email if you wanna talk to him directly (think that would be faster than me proxying).
Thu, Jun 22
Let us skip this for now, if we have another setup in the future we can look at that.
I think T168526 is the thing we need
There's no end to this. I just hope it will stop when we move to Linux.
I've seen that the same queries for the alerts work for me when I tried them out on another instance.
Jun 22 2017
I changed the alerts so it follows the save timings pattern for when we compare the size, I'll keep an eye on them today.
I changed the queries following the same pattern as the Save Alerts.
These are the type of queries that breaks the alerts
I've fixed it now for the first alert: https://grafana.wikimedia.org/dashboard/db/webpagetest-alerts?refresh=5m&panelId=15&fullscreen&orgId=1
I've updated Grafana but all the alerts get a 500 error:
I've been checking out the metrics but I can't see any difference. Still it's more sane to have 30s so let use that for now.
Jun 21 2017
I've manually updated the server with git pull origin master and then after fixing two conflicts I get the latest version and the mobile viewport finally works! About Firefox: It should almost be finished (see https://github.com/WPO-Foundation/webpagetest/issues/878#issuecomment-310098880), so lets check when I'm back from vacation in August. We can close the issue when we know that Firefox works.
Let me test this again on a larger instance, I think we need c4.large. When I done the tests and it works out, we should think about skipping paying by hour and instead do two reserved instances on AWS, that will be the same price as running one instance paying by the hour as we do now.
Yes I've removed them some time ago since it wasn't working and constantly failing (the same with iOS). Lets close it and we really needed help from Opera to be able to pinpoint what happened but I feel we didn't get that.
Adding this as a reference: http://wpt.wmftest.org/video/compare.php?tests=170621_NK_8R-r:3,170621_NK_8R-r:1
I've added the locations in https://github.com/WPO-Foundation/webpagetest/blob/561541499dc4a3f83fc207971663f515189b8c7a/www/settings/ec2_locations.ini restarted and tested an instance from Mumbai:
Jun 20 2017
Jun 19 2017
There's one more thing I want to try out before we close this and that is to see that the workflow works with MIT Proxy. If it works, we can use it, The next thing is to make another test to see how stable the metrics will be. I think easiest for now is to use SpeedIndex & FirstVisualChange, in the long run we want to find a way to use the Chrome trace log to get more metrics.
Lets make this run for a while and then pick it up and compare to see if we can find difference in metrics.
Jun 15 2017
Yep, we log the WebPageTest URL now, so when we get one, it's just copy paste and we can see the error.
Jun 14 2017
I would like to have higher prio on this one. Using the free ones on WPT doesn't give us anything. Either we should do it ourselves or find a service we can use,
You can automate it with Chrome using Browsertime then you get both video and the resource-timing and you can locally set the connectivity in a better way than using Chrome devtools (or did you use Network Link Conditioner? The conditioner stopped work for me a couple of updates ago). Let me show you next week.
No it's because I changed the limit from 50% to 20%:
No, we should decrease the alerts, for some we have a 50% limit. I've changed the two for backend timing to 20% that seems more reasonable.
I've changed now so the right graphs show the latest 30d (as the other graphs) so it is hopefully easier to spot regressions:
Jun 13 2017
Fixed now. The problem from the beginning is #3306827 that we can't use the same as the other alerts.
I think that is my fault. The alerts is set to 50% but it should be 0.50 (you can see the values when you test fire the alerts). I'll change that.
Closing this now since it works but there a couple of things that isn't optimal IMHO that we need to reconsider:
This starts to look good now! Lets fix the metrics description and fine tune the CSS with uncss and the task is ok for now. Then we can think about using graphs from Grafana later on.
The metrics looks good now, I've verified that the new agents are closed down. Tomorrow I'll change the WebPageTest alerts to go back 24h instead of a week, to make sure we don't get alerts on Saturday.
Jun 12 2017
Pat has rollbacked so the certificates are update through Windows update for now. I tested and it went through on us-east-1. I changed Jenkins so the next run will use us-east-1 and we are back to normal. I'll clean up tomorrow. Pat also pointed out that this can be a real problem for users without the update.
Yep, it could probably be that. Lets wait then until he either unblock the updates or deploys a new ami, I'll update asap when I see that.
Ping @Jdlrobson so you know about the current metrics status from WebPageTest.
Fyi: The agents run Win 6.1: http://wpt.wmftest.org/getTesters.php?f=html
It worked running from Europe I've updated the server to setup the same instance size on European agents and now it is the right. I'll do a hot change for Jenkins but the key will change in Graphite, so I'll just wait with changing graphs until we decide how to do it in the future.
I'll have a look, I think you need to add it each and every script (since it is Windows and I cannot hack it), it's easier to switch to Europe, I'll try that first, then disable the alarms (or change them).