I've setup CORS for grafana.wikimedia.org, compare.sitespeed.io (do we want to host our own version?) and the temporary Grafana instance that I've been using to test the setup.
I've changed configuration and restarted the tests, let me test later today that it really woks.
I did a quick try with the extra parameters in the release doc, but no luck, it seems that Geckodriver can't connect. Lets wait until Mozilla add some more examples to the documentation.
I think in the future we should aim for updating the browser without any code changes. Even though we test the change against the same browser version, I think it would be cool to separate release since the browser changes is out of our control. We can just roll out the new browser version when its released?
Tue, Oct 15
I looked at the graphs and it looks like this happens when we run emulated mobile both using WebPageReplay and without replaying (so the problem is not WebPageReplay). We use tc in both cases (but for WPR we only use latency on localhost). I couldn't see anything in the desktop tests.
First Visual Change looks ok, we had some higher metrics that correlates to higher TTFB:
This works now:
Mon, Oct 14
If we could get help to fix that would be super (its hard for me to know how much work it is)? When it's done we can proceed and add automatic performance alerts for all the wikis we test and we also need to redo the dashboards we have but the good thing is that it will give us more insights.
I pushed this now it works on my local machine but lets make sure it works on our test server until I close the issue.
I'm not sure but on WebPageTest we got the full screen donate at around the same time so maybe it's related?
We catched that on some runs on WebPageReplay using last visual change:
Sun, Oct 13
I've tested it some more and could reproduce (added my info in T235092). We could have seen this looking at Last Visual Change, but we have no alerts on that. Let me look into that more tomorrow. Then after that we can close this as an duplicate.
I could reproduce this on my Mac on Safari (13.0.2) on https://en.m.wikipedia.org/wiki/Barack_Obama and also on Firefox 70.0b14. Attaching a video of what it looks like.
Fri, Oct 11
It works now. I've enabled network traffic alerts, so if there's no traffic going out from the machine during two hours, the alert will fire.
Yes thanks, it's the same as the generator tag right? I've added that but hasn't pushed it yet.
Now it works again:
Tests come through but no data:
Right after I emptied the queue we got 200 new jobs, that seems like very many, I don't how they got there? I could see though that a job went through. I emptied the queue again.
Couldn't see any change in the commits for WPT around that time (and one agent work). The queue:
Thu, Oct 10
I've started this again, trying to make a more sane script that works. I can go through the full scenario on beta and we need to find a way to get the important metric(s).
Mon, Oct 7
Wanted to add one concern that I raised in the upstream issue for Chromium: There's a risk that the lite pages gives our user worse performance and lock in the user in the Google universe. Let me explain.
Fri, Oct 4
The Speed Index is back to normaland no grey flash in the screenshots.
It looks good now, thank you @Jdlrobson for fixing it so fast!
FYI: I could reproduce this in both Safari and Chrome. Using devtools in Chrome I could record a timeline and then see the grey box in the screenshots in the trace.
Thu, Oct 3
Implemented in T234414
This is done for the new setup.
Thank you @CDanis . I need to open up traffic for the Graphite server security group on AWS, what would be the correct IP to open for? Is that enough security or should I add something more?
Wed, Oct 2
@CDanis I wanna try add an external Graphite data source but I don't have sufficient privileges in Grafana admin to add data a sources. I would wanna try it that way so I can see that it works (need to fine tune the security group for that instance so traffic can come through). Can you give me access or what's the correct way to do it?
Mon, Sep 30
I couldn't see anything on the server side when I looked earlier today. I'm trying to finish off the move to the new setup and then it will work, hopefully I can finish that this week and then starting to change server by server.
I've deployed a small instance earlier today and it seems to work fine.
It didn't make our WebPageReplay metrics more stable, rather it introduced higher standard deviation (the blue vertical line is when I did the change):
Sun, Sep 29
I've been tracking JSHeap for a while (but I haven't looked at it) and it looks like it will hard to use to see any changes because the metrics is going up and down. Here's two examples one for mobile and one desktop and I guess if we do a change it needs to big for us to see it.
Sat, Sep 28
I think this is a problem with how we send the data through statsv. When I did the update, we started to send more metrics because that pull enabled "hero" metrics again.
Hmm that match when I updated the server the last time (running git pull), let me look into it.
No the throttling was not the cause, we still get the same after the change:
I've added these settings for now:
Fri, Sep 27
Thu, Sep 26
I think it would be really interesting to try this on one wiki for a short while and measure what kind of win it will make in number of bytes. If we can show the numbers, it would be easier to talk to Chrome to put prio on the solution for printing?
I ordered a Android Go phone today and can start run some tests early next week.
Wed, Sep 25
It's there now for both Browsertime and WebPageTest
So our old instance was also missing out on them. I checked the new config (you can turn them on now) and it was there. I've updated the WebPageTest server that hasn't been updated in a while, lets see if that fixes things.
This only happens on the mobile tests, not dekstop. We set the same latency so I guess then it shouldn't be tc that's the problem. Maybe not even WebPageReplay. One thing that is different is that we run Chromes "CPU Throttle" at rate 5 for mobile, we don't do that for desktop. That's the only difference I can come up with.