Page MenuHomePhabricator

Try out running tests on Bitbar
Closed, ResolvedPublic

Assigned To
Authored By
Peter
Feb 9 2021, 8:53 AM
Referenced Files
F34107468: Screenshot 2021-02-16 at 15.26.41.png
Feb 16 2021, 2:33 PM
F34107466: Screenshot 2021-02-16 at 15.27.19.png
Feb 16 2021, 2:33 PM
F34107470: Screenshot 2021-02-16 at 15.28.24.png
Feb 16 2021, 2:33 PM
F34107285: Screenshot 2021-02-16 at 13.17.36.png
Feb 16 2021, 12:25 PM
F34107289: Screenshot 2021-02-16 at 13.18.47.png
Feb 16 2021, 12:25 PM
F34107294: Screenshot 2021-02-16 at 13.19.29.png
Feb 16 2021, 12:25 PM
F34107287: Screenshot 2021-02-16 at 13.17.51.png
Feb 16 2021, 12:25 PM
F34107291: Screenshot 2021-02-16 at 13.18.58.png
Feb 16 2021, 12:25 PM
Subscribers

Description

Today I got everything I need to run Android test on Bitbar so we can get a feeling for how much work it is to setup and what kind of metrics we get. I'm gonna use this task to document the setup.

Event Timeline

So the GUI do not support Safari but hopefully we don't need to use the GUI so much:

Screenshot 2021-02-09 at 09.55.12.png (126×1 px, 91 KB)

I've been testing the simple cloud version. The idea is that you upload a bash script that do your testing. I've tested with a super simple version:

#!/bin/bash
adb --version
node --version
npm install browsertime -g
echo "Start tests ..."
browsertime --android -n 1 https://en.m.wikipedia.org/wiki/Barack_Obama

It works fine but the Chrome versions do not match (This version of ChromeDriver only supports Chrome version 88 Current browser version is 83.0.4103.106).

I think for us to be able to really evaluate we need a dedicated instance + network throttling, I'll ask about that.

One more thing to test is if we can reach our Graphite/S3 direct from the machine running the tests, that will make it much easier for setup. Else we need to use the API, download the result zip file and push the data.

I've got help to run on latest Chrome and a Moto G5 device with throttled 3g. We can also send metrics to Graphite/S3 so I will try that tonight/tomorrow.

For the first run, the wifi doesn't look so good (I think also others use it). Checkout the change in TTFB:

[2021-02-11 09:19:49] INFO: Run tests on Moto G (5) [ZY322RX6D6] using Android version 7.0
[2021-02-11 09:19:49] INFO: Running tests using Chrome - 11 iteration(s)
[2021-02-11 09:19:57] INFO: Testing url https://en.m.wikipedia.org/wiki/Barack_Obama iteration 1
[2021-02-11 09:20:11] INFO: https://en.m.wikipedia.org/wiki/Barack_Obama TTFB: 1.06s DOMContentLoaded: 4.02s firstPaint: 4.19s FCP: 4.19s LCP: 4.19s Load: 5.76s TBT: 710ms CLS:0.0681
[2021-02-11 09:20:18] INFO: Testing url https://en.m.wikipedia.org/wiki/Barack_Obama iteration 2
[2021-02-11 09:20:33] INFO: https://en.m.wikipedia.org/wiki/Barack_Obama TTFB: 594ms DOMContentLoaded: 4.32s firstPaint: 4.46s FCP: 4.46s LCP: 4.46s Load: 6.64s TBT: 669ms CLS:0.0681
[2021-02-11 09:20:41] INFO: Testing url https://en.m.wikipedia.org/wiki/Barack_Obama iteration 3
[2021-02-11 09:20:57] INFO: https://en.m.wikipedia.org/wiki/Barack_Obama TTFB: 1.04s DOMContentLoaded: 5.30s firstPaint: 5.17s FCP: 5.17s LCP: 5.17s Load: 8.77s TBT: 669ms CLS:0.0681
[2021-02-11 09:21:05] INFO: Testing url https://en.m.wikipedia.org/wiki/Barack_Obama iteration 4
[2021-02-11 09:21:18] INFO: https://en.m.wikipedia.org/wiki/Barack_Obama TTFB: 562ms DOMContentLoaded: 4.10s firstPaint: 3.79s FCP: 3.79s LCP: 3.79s Load: 5.74s TBT: 701ms CLS:0.0681
[2021-02-11 09:21:26] INFO: Testing url https://en.m.wikipedia.org/wiki/Barack_Obama iteration 5
[2021-02-11 09:21:39] INFO: https://en.m.wikipedia.org/wiki/Barack_Obama TTFB: 554ms DOMContentLoaded: 3.96s firstPaint: 3.83s FCP: 3.83s LCP: 3.83s Load: 5.75s TBT: 704ms CLS:0.0681
[2021-02-11 09:21:47] INFO: Testing url https://en.m.wikipedia.org/wiki/Barack_Obama iteration 6
[2021-02-11 09:22:00] INFO: https://en.m.wikipedia.org/wiki/Barack_Obama TTFB: 569ms DOMContentLoaded: 3.98s firstPaint: 3.86s FCP: 3.86s LCP: 3.86s Load: 5.73s TBT: 695ms CLS:0.0681
[2021-02-11 09:22:08] INFO: Testing url https://en.m.wikipedia.org/wiki/Barack_Obama iteration 7
[2021-02-11 09:22:21] INFO: https://en.m.wikipedia.org/wiki/Barack_Obama TTFB: 532ms DOMContentLoaded: 4.07s firstPaint: 3.95s FCP: 3.95s LCP: 3.95s Load: 5.57s TBT: 716ms CLS:0.0681
[2021-02-11 09:22:29] INFO: Testing url https://en.m.wikipedia.org/wiki/Barack_Obama iteration 8
[2021-02-11 09:22:42] INFO: https://en.m.wikipedia.org/wiki/Barack_Obama TTFB: 558ms DOMContentLoaded: 3.90s firstPaint: 3.75s FCP: 3.75s LCP: 3.75s Load: 5.56s TBT: 709ms CLS:0.0681
[2021-02-11 09:22:49] INFO: Testing url https://en.m.wikipedia.org/wiki/Barack_Obama iteration 9
[2021-02-11 09:23:06] INFO: https://en.m.wikipedia.org/wiki/Barack_Obama TTFB: 741ms DOMContentLoaded: 5.37s firstPaint: 5.20s FCP: 5.20s LCP: 5.20s Load: 8.10s TBT: 681ms CLS:0.0681
[2021-02-11 09:23:13] INFO: Testing url https://en.m.wikipedia.org/wiki/Barack_Obama iteration 10
[2021-02-11 09:23:27] INFO: https://en.m.wikipedia.org/wiki/Barack_Obama TTFB: 745ms DOMContentLoaded: 3.93s firstPaint: 3.79s FCP: 3.79s LCP: 3.79s Load: 5.71s TBT: 707ms CLS:0.0681
[2021-02-11 09:23:34] INFO: Testing url https://en.m.wikipedia.org/wiki/Barack_Obama iteration 11
[2021-02-11 09:23:48] INFO: https://en.m.wikipedia.org/wiki/Barack_Obama TTFB: 565ms DOMContentLoaded: 4.12s firstPaint: 3.89s FCP: 3.89s LCP: 3.89s Load: 5.76s TBT: 690ms CLS:0.0681
[2021-02-11 09:23:48] INFO: https://en.m.wikipedia.org/wiki/Barack_Obama 28 requests, TTFB: 684ms (±56.46ms), firstPaint: 4.17s (±156.38ms), FCP: 4.17s (±156.38ms), DOMContentLoaded: 4.28s (±153.39ms), LCP: 4.17s (±156.38ms), CLS: 0.0681 (±0.00), TBT: 696ms (±4.72ms), Load: 6.28s (±320.17ms) (11 runs)
[2021-02-11 09:23:48] INFO: Wrote data to browsertime-results/en.m.wikipedia.org-wiki-Barack_Obama/2021-02-11T091948+0000
Done ...

There's also other alternatives to setup a throttled connection, I will talk with Bitbar about it.

That test was using the 3g wifi, the 4g wifi looks better. But there's no RTT set on those, let me check if they can add that.

The only thing now missing is to trigger runs using their API. I'll dig into it.

I've been able to verify that it works to send to Graphite and S3.

I've been running tests over the weekend on the 3g and 4g wifi:

Here's the TTFB on the 3g.

Screenshot 2021-02-15 at 08.36.51.png (1×2 px, 1 MB)

And median First Visual Change during the same period:

Screenshot 2021-02-15 at 08.37.51.png (1×2 px, 995 KB)

And 4g TTFB:

Screenshot 2021-02-15 at 08.37.10.png (1×2 px, 874 KB)

And 4g First Visual Change:

Screenshot 2021-02-15 at 08.37.30.png (1×2 px, 995 KB)

Today I switched to use gnirehtet instead of the wifi, I've setup a 3g and a 4g run. When I first tried 4g I don't get the same stability as on my own computer. Lets see, I will keep the test run for a couple of days to see what get.

I've been running gnirehtet for a while now with the following result:
First Visual Change 3g

Screenshot 2021-02-16 at 13.17.36.png (1×2 px, 1008 KB)

First Visual Change 4g

Screenshot 2021-02-16 at 13.17.51.png (1×2 px, 1014 KB)

TTFB 4g

Screenshot 2021-02-16 at 13.18.47.png (1×2 px, 1001 KB)

TTFB 3g

Screenshot 2021-02-16 at 13.18.58.png (1×2 px, 1013 KB)

It looks like we got much higher variance between runs on Bitbar that running the same setup at hope with my Mac. I've setup a test to run the same setup every 10 minutes at home to see what kind numbers we get. If we look at the spread between 11 runs it looks like this for 4g:

Min/Median/Average/Max

Screenshot 2021-02-16 at 13.19.29.png (152×2 px, 73 KB)

That seems to too high and something isn't working as it should. I'm gonna collect more data.

Okay, I've been running the same setup at home for almost 6 hours, running tests every 12 minutes and it looks like this (using 4g setup using my Mac):

Screenshot 2021-02-16 at 15.27.19.png (1×3 px, 292 KB)

Screenshot 2021-02-16 at 15.26.41.png (1×3 px, 253 KB)

Min/Median/Average/Max

Screenshot 2021-02-16 at 15.28.24.png (84×2 px, 19 KB)

The TTFB median looks much better and you can also see the variance between runs. It looks like something isn't right at Bitbar.

I added test so we compare Bitbar vs Kobiton: I test the the static Banksy page, 11 runs each and looking and min/medin/max and stdev:
https://grafana.wikimedia.org/d/Tbeh-peWk/test-kobiton?orgId=1&from=now-7d&to=now

I've also enabled test using gnirehtet against the static Banksy test to compare with testing with wifi.