Page MenuHomePhabricator

Investigate difference in variance in metrics between Linux/Windows
Closed, ResolvedPublic

Description

Moving to the Linux seems to give us more unstable values. This is what start render looks like for Windows (Sweden):

Screen Shot 2018-02-06 at 8.39.19 AM.png (624×1 px, 108 KB)

The same page on Linux:

Screen Shot 2018-02-06 at 8.40.31 AM.png (608×1 px, 82 KB)

For Windows the diff is 200 ms (13%) but for Linux 500 ms (70%!!!!).

SpeedIndex for the same page looks like this (Windows):

Screen Shot 2018-02-06 at 10.11.34 AM.png (614×1 px, 106 KB)

And Linux:

Screen Shot 2018-02-06 at 10.11.54 AM.png (608×1 px, 86 KB)

We need to look into it before we do the change. Do we need a larger instance? One way could be to pick the median run instead of the fastest. We could also try to increase the number of runs.

Right now the pick of fastest is hardcoded in the WPT wrapper but we can move that out and configure it in Jenkins for Linux.

Event Timeline

Peter renamed this task from Pick median run instead of fastest to Investigate difference in variance in metrics between Linux/Windows.Feb 6 2018, 9:12 AM
Peter triaged this task as Medium priority.
Peter updated the task description. (Show Details)

Adding also Obama for reference. Windows:

Screen Shot 2018-02-06 at 10.20.53 AM.png (618×2 px, 177 KB)

Linux

Screen Shot 2018-02-06 at 10.20.33 AM.png (632×2 px, 156 KB)

Not as big difference but not as good as before.

I created an upstream issue https://github.com/WPO-Foundation/wptagent/issues/79 and will also do a switch to pickup the median run instead of the fastest.

Change 408804 had a related patch set uploaded (by Phedenskog; owner: Phedenskog):
[performance/WebPageTest@master] Add two extra runs where we pick the median run

https://gerrit.wikimedia.org/r/408804

Change 408804 merged by jenkins-bot:
[performance/WebPageTest@master] Add two extra runs where we pick the median run

https://gerrit.wikimedia.org/r/408804

Here's an example of where we have a big difference (500 ms) between different runs (1st and 3rd):

http://wpt.wmftest.org/video/compare.php?tests=180213_4D_BK-r%3A1%2C180213_4D_BK-r%3A3&thumbSize=200&ival=100&end=visual#

The only other thing that differs is the amount of bytes (that is strange).

The same here (another URL, same different between different runs):
http://wpt.wmftest.org/video/compare.php?tests=180213_YQ_16Y-r:1,180213_YQ_16Y-r:3

But when I download the HAR file, there is no difference in the amount of data, so maybe something else is going on.

@Gilles @Krinkle do you see something else?

Pat did some tests yesterday when he got 200 ms in variance (much better): https://github.com/WPO-Foundation/wptagent/issues/79#issuecomment-365694467

I went through the numbers more and the Firefox metrics starts to look better but I still think there's something wrong with the implementation for Chrome (and not on our side). When I checked the metrics for mobile we have a diff there too, but not as much as on Desktop. I'll compare some more during the day.

Change 412885 had a related patch set uploaded (by Phedenskog; owner: Phedenskog):
[performance/WebPageTest@master] Test slower connection for more stable metrics on Linux

https://gerrit.wikimedia.org/r/412885

Change 412885 merged by jenkins-bot:
[performance/WebPageTest@master] Test slower connection for more stable metrics on Linux

https://gerrit.wikimedia.org/r/412885

I did try slowing down the connection, comparing cable vs 3gfast. For Obama we have the same difference, just moving the start values:

Screen Shot 2018-02-22 at 8.56.26 AM.png (1×1 px, 190 KB)

Facebook is another story:

Screen Shot 2018-02-22 at 8.57.28 AM.png (1×1 px, 163 KB)

We got super stable metrics (instead of the other way around) with 3gfast. I would say that we hit a really bad spot going to Linux on that connectivity.

I think we should move on with the move to Linux and then think what connectivities we should use for our tests. I mean now for alerting I think we are ready to switch to WebPageReplay/Browsertime and then we can continue to have WebPageTest to show what it can look like for a user.

Change 413311 had a related patch set uploaded (by Phedenskog; owner: Phedenskog):
[performance/WebPageTest@master] Remove our tests for 3gfast on Linux.

https://gerrit.wikimedia.org/r/413311

Change 413311 merged by jenkins-bot:
[performance/WebPageTest@master] Remove our tests for 3gfast on Linux.

https://gerrit.wikimedia.org/r/413311

Summary: Yes there are difference. Linux is usual faster but we are also seeing larger variance. I think we can move on with Linux and then focus on WebPageReplay/Browsertime for alerts.

WebPageTest on Linux switch how to setup traffic shaping https://github.com/WPO-Foundation/wptagent/commit/f90afeef410cf522b7bafde8ea906901a150004f and after that our metrics looks better.

Interesting, worth testing that different tool for webpagereplay