Page MenuHomePhabricator

AWS instability in CPU performance
Closed, ResolvedPublic

Description

I've been looking into the instability of metrics of Firefox metrics in WebPageTest in T288451 and today thought I found a reason but I think I found a problem that we have across the board for all tests we run on AWS.

We the CPU benchmark JavaScript to measure how "fast" the CPU is in our synthetic tests (the same as we do for some of our real users). When I started dig into the Firefox result I could see that it looked like the CPU performance for most run where 60 ms for our tests but now when we do 11 runs, some of the runs have 90 ms. That is quite hight difference. Then I looked into the Chrome tests and could see the same difference there (Chrome metrics are more stable, but Chrome also get more love in WebPageTest than Firefox). Both browsers get the same, maybe it has something to do with WebPageTest (that the tests CPU benchmark runs at the same time as something else or that the specific AWS instance is broken?

Then I looked at our WebPageReplay tests (that uses sitepeed.io with WebPageReplay). Here are five different runs:

Screenshot 2021-08-23 at 19.29.45.png (150×2 px, 78 KB)

Aha the same pattern here! Independently of tool, we see the same thing. Then it is probably AWS?

We don't have a bare metal server where we run our tests today but I have my own tests that runs on Mac mini M1 (dedicated device). There are numbers are much closer:

safarai-mac-mini.png (118×2 px, 104 KB)

However at the moment I only run tests with Safari on the Mac mini so that's a factor. Let me turn on Chrome tests to make 100% sure that the metrics are more stable.

When me and Gilles many years ago started with the WebPageReplay tests and tried different providers we didn't think about doing CPU benchmarks tests over time, so we missed it.

Do the different matters for our other metrics? I'm not sure but I think we should dig deeper and try to run tests in an more isolated environment.

Event Timeline

Peter claimed this task.

So it probably depends on what runs at the same time on the physical machine. Been going through the tests again and now its much more stable:

Screenshot 2021-10-18 at 09.13.00.png (226×1 px, 39 KB)

The stddev is 1 ms over 11 runs. Let me add an alert for high stddev and then lets see if I can see a pattern. Lets close this for now and if it happens again we can think about what we can do.