Document TTFB stability in synthetic testing
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	Peter
	Feb 18 2021, 10:43 AM

Description

One of the key issues for us with the synthetic testing is to get stable TTFB between test runs. If we have a stable TTFB it will be easier to find front end regressions. This is extra important for our mobile testing since we have had problems getting stable metrics.

Let me document what kind of variation we have today in our different setup as a start and maybe we can make a blog post of it in the future.

Related Objects

Mentioned In: T273855: Evaluate using the iOS simulator on a Mac M1 to run web performance tests
T275347: Investigate high TTFB for Android WebPageReplay tests
T275346: Investigate high TTFB for WebPageTest desktop tests
Mentioned Here: M1: MediaWiki Userpage

Event Timeline

Peter created this task.Feb 18 2021, 10:43 AM

I'll go through the different tests we run, focusing on the Barack Obama page since we test that page on all different setups. Where we have 7 days of data I'll use that and look at min/median/max TTFB during that time and also the standard derivation.

Desktop simulate cable connection on AWS

We run a couple of test using browsertime/sitespeed.io testing the first view of Wikipedia. So far I mainly been using them to verify that the WebPageReplay tests is ok. We run the tests inside Docker and the throttling use Throttle.

stdev-ttfb-barack-obama-aws-cable.png (702×2 px, 533 KB)

Screenshot 2021-02-18 at 12.34.54.png (686×2 px, 533 KB)

The median TTFB through 7 days are 175-184 ms. I would rank that as good, but we can also see that the max value and standard deviation sometimes have spikes.

Emulated mobile simulate 3g connection on AWS

We run tests where we set Chrome in emulation mode and try to emulate running on an Android phone. We run the tests inside Docker and the throttling use Throttle.

Screenshot 2021-02-18 at 12.45.20.png (692×2 px, 118 KB)

Screenshot 2021-02-18 at 12.45.47.png (682×2 px, 119 KB)

Desktop simulate cable connection on a Mac mini M1

I've also been testing running desktop tests on a Mac mini at Mac Stadium. There we do not run inside Docker but we still use Throttle to throttle the connection.

Screenshot 2021-02-18 at 12.47.59.png (1×2 px, 602 KB)

Screenshot 2021-02-18 at 12.47.46.png (1×2 px, 643 KB)

Simulate iOS mobile 3g connection on a Mac mini M1

I also use that instance to test the iOS simulator, running tests as the latest iPhone.

Screenshot 2021-02-18 at 12.51.33.png (688×2 px, 384 KB)

Screenshot 2021-02-18 at 12.51.23.png (692×2 px, 540 KB)

Desktop WebPageReplay on AWS

We also run tests using WebPageReplay as a replay proxy. When we do that we add 100 ms latency on localhost.

Screenshot 2021-02-18 at 12.55.55.png (698×2 px, 551 KB)

Screenshot 2021-02-18 at 12.56.06.png (692×2 px, 577 KB)

Emulated mobile WebPageReplay on AWS

We also run tests using WebPageReplay as a replay proxy. When we do that we add 100 ms latency on localhost.

Screenshot 2021-02-18 at 12.57.42.png (686×2 px, 569 KB)

Screenshot 2021-02-18 at 12.57.55.png (702×2 px, 624 KB)

WebPageTest desktop on AWS using cable

WebPageTest do their own throttling but its based on the same idea. Today we only have the median TTFB however its not the same as in the other tests, since with WebPageReplay we pick the median Speed Index run and use that run and take the TTFB from that run, so we cannot compare with the rest. The standard deviation is measured the same way though:

Screenshot 2021-02-18 at 13.00.21.png (1×2 px, 715 KB)

Screenshot 2021-02-18 at 13.04.33.png (978×2 px, 753 KB)

WebPageTest emulated mobile on AWS using 3gfast

We also run tests where emulate Chrome and use the 3gfast setting. It's the same thing here with the median: it is picked from the run with the median Speed Index so we cannot compare that directly with the rest of our tools.

Screenshot 2021-02-18 at 13.05.41.png (976×2 px, 776 KB)

Screenshot 2021-02-18 at 13.06.42.png (1×2 px, 1 MB)

Android tests using "3g"-wifi

We also run tests on Android phones. Today we use Kobiton as host and we have had some problems with the connectivity setup.

Screenshot 2021-02-18 at 13.16.30.png (690×2 px, 565 KB)

Screenshot 2021-02-18 at 13.16.22.png (686×2 px, 596 KB)

Android tests using WebPageReplay and 100 ms latency

We also run tests on an Android phone against a WebPageReplay replay server.

Screenshot 2021-02-18 at 13.18.43.png (688×2 px, 541 KB)

Screenshot 2021-02-18 at 13.18.36.png (692×2 px, 565 KB)

Surprisingly (at least for me) we have had really high TTFB, it looks like something is broken.

I'm gonna add some data from Bitbar too but want to give them chance to tweak their settings first. What's also missing is using gnirehtet and I have a hard time having a device up and running at home for 7 days.

I went through add calculated the difference in TTFB % over time by the formula: (max - min) /min

Type	Difference
Desktop simulate cable connection on AWS	5,1 %
Emulated mobile simulate 3g connection on AWS	25%
Desktop simulate cable connection on a Mac mini M1	24%
Simulate iOS mobile 3g connection on a Mac mini M1	1,6 %
Desktop WebPageReplay on AWS	0,7%
Emulated mobile WebPageReplay on AWS	1,0 %
WebPageTest desktop on AWS using cable	318%
WebPageTest emulated mobile on AWS using 3gfast	6,3%
Android tests using "3g"-wifi Kobiton	193%
Android tests using WebPageReplay and 100 ms latency	1,25% (edited)

Something is seriously broken on the WebPageReplay tests on Android, I think something going on with the hosting, I'll get back to Kobiton about that.

Also WebPageTest tests on Desktop stands out, I wonder if we have a bad instance and should just replace it. Would be interesting to also compare with what we get when we move the server inhouse.

@dpifke interested in your feedback on this. Also it would be great to have your eyes on the commands that actually do the throttling https://github.com/sitespeedio/throttle - the actual tc/pfctl commands that runs since you started to implement that for the wifi throttling, I haven't had any eyes on that for a while and maybe there are room for improvements that can give us more stable metrics.

Peter mentioned this in T275346: Investigate high TTFB for WebPageTest desktop tests.Feb 22 2021, 7:42 AM

Peter mentioned this in T275347: Investigate high TTFB for Android WebPageReplay tests.Feb 22 2021, 7:47 AM

There where something strange going on on the Mac mini/Kobiton side when we collected the metrics for Android/WebPageReplay setup. Checking a new timespan it looks like this.

Android tests using WebPageReplay and 100 ms latency.

Screenshot 2021-02-22 at 08.56.11.png (1×2 px, 1 MB)

Screenshot 2021-02-22 at 08.55.59.png (1×2 px, 1 MB)

The difference in TTFB is then 1,25%. That almost match the emulated mobile running on AWS which is pretty cool.

I'm adding the Bitbar numbers here from the first test:

Android tests using "3g"-wifi Bitbar (first go)

Screenshot 2021-02-22 at 13.45.41.png (694×2 px, 572 KB)

Screenshot 2021-02-22 at 13.45.30.png (752×2 px, 711 KB)

And median variation: 35%.

Android tests using Bitbar using gnirehtet

Screenshot 2021-02-22 at 13.48.35.png (696×2 px, 545 KB)

Screenshot 2021-02-22 at 13.48.24.png (708×2 px, 547 KB)

And median variation: 22%.

Peter moved this task from Inbox, needs triage to Doing (old) on the Performance-Team board.Feb 22 2021, 4:29 PM

Krinkle moved this task from Doing (old) to Doing: Goals on the Performance-Team board.Mar 2 2021, 9:16 PM

Peter mentioned this in T273855: Evaluate using the iOS simulator on a Mac M1 to run web performance tests.Mar 8 2021, 10:39 AM

The Mac mini has been moved and the phones use netropy device for wifi. It's been setup with 9000 kb/s up and down + 40 ms delay in both directions. Also when the mini been moved the metrics looks much better for WebPageReplay.

Android tests using WebPageReplay and 100 ms latency

Screenshot 2021-05-19 at 13.40.35.png (756×2 px, 661 KB)

Screenshot 2021-05-19 at 13.40.44.png (696×2 px, 616 KB)

The difference is 0.1% for median ttfb (really good).

Android tests using netropy 9000 kb/s up and down + 40 ms delay in both directions (testing Banksy)

Screenshot 2021-05-19 at 13.45.32.png (750×2 px, 592 KB)

Screenshot 2021-05-19 at 13.45.24.png (706×2 px, 609 KB)

The difference is 25%.

Let me do a new summary and then close then close the task.

This will do without a new summary.

	F34459891: Screenshot 2021-05-19 at 13.45.24.png
	May 19 2021, 11:48 AM

	F34459890: Screenshot 2021-05-19 at 13.45.32.png
	May 19 2021, 11:48 AM

	F34459884: Screenshot 2021-05-19 at 13.40.44.png
	May 19 2021, 11:48 AM

	F34459885: Screenshot 2021-05-19 at 13.40.35.png
	May 19 2021, 11:48 AM

	F34118659: Screenshot 2021-02-22 at 13.48.35.png
	Feb 22 2021, 12:50 PM

	F34118660: Screenshot 2021-02-22 at 13.48.24.png
	Feb 22 2021, 12:50 PM

	F34118652: Screenshot 2021-02-22 at 13.45.30.png
	Feb 22 2021, 12:50 PM

	F34118651: Screenshot 2021-02-22 at 13.45.41.png
	Feb 22 2021, 12:50 PM

Document TTFB stability in synthetic testingClosed, ResolvedPublicActions

Description

Related Objects

Event Timeline

Desktop simulate cable connection on AWS

Emulated mobile simulate 3g connection on AWS

Desktop simulate cable connection on a Mac mini M1

Simulate iOS mobile 3g connection on a Mac mini M1

Desktop WebPageReplay on AWS

Emulated mobile WebPageReplay on AWS

WebPageTest desktop on AWS using cable

WebPageTest emulated mobile on AWS using 3gfast

Android tests using "3g"-wifi

Android tests using WebPageReplay and 100 ms latency

Android tests using WebPageReplay and 100 ms latency.

Android tests using "3g"-wifi Bitbar (first go)

Android tests using Bitbar using gnirehtet

Android tests using WebPageReplay and 100 ms latency

Android tests using netropy 9000 kb/s up and down + 40 ms delay in both directions (testing Banksy)

Document TTFB stability in synthetic testing
Closed, ResolvedPublic
Actions