Page MenuHomePhabricator

Test out AWS device farm
Closed, ResolvedPublic


Could the AWS device farm be a way for us to test performance on different devices?

We need to dig into the docs and find out:

  • Is it possible to mount a device on a AWS server (instead of just driving it from your desktop)?
  • Can you acquire the exact same device, or can we run multiple tests on the same device?
  • If cannot run from a AWS server, would it be possible to hook in something like to be able to measure difference between URLs? Maybe we could have one machine that handles the testing.

Event Timeline

Peter created this task.Jan 10 2018, 5:19 AM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJan 10 2018, 5:19 AM
Peter added a comment.Jan 11 2018, 1:44 PM

Had a quick look through the docs and it seems like you can only access the devices from your local server and not from you AWS server. It still could be usable but not as cool :)

Krinkle triaged this task as Medium priority.Jan 16 2018, 2:49 PM
Gilles claimed this task.Apr 25 2018, 11:33 AM

I've managed to write a test that drives Chrome on Android (Safari on iOS) and had the Chrome instance report firstPaint after pageload in the console, which is captured by AWS:

Basically in an automated test context, you have to use Appium, which is a Selenium wrapper and essentially write a small Selenium script. This is what my code looks like:

import logging
from selenium import webdriver

class Test(object):
	logger = logging.getLogger('Test')

	def test_foo(self):'Starting test')

		# setup the web driver and launch the webview app.
		capabilities = { 'browserName': 'Chrome' }
		driver = webdriver.Remote('http://localhost:4723/wd/hub', capabilities)

		# Navigate to the page

		driver.execute_script("console.log('firstPaint: ' + performance.getEntriesByType('paint').filter(entry => == 'first-paint')[0].startTime);")'Page loaded, firstPaint: {}'.format(firstPaint) )

		# close the driver

They provide an API:

And glancing at it, it looks like we could programmatically schedule such test runs. Since the whole thing is a bit of a black box, I think the easiest is to do like my firstPaint example and just inject some JS that would report RUM metrics to wherever we want (eg. statsv).

As for visual metrics, the video is of the whole screen, so might be possible if the video is retrievable via the API and by cross-referencing the video time with the timestamps of the run to know when the browser is really asked to start loading the page:

However, the video seems to be recorded at an underwhelming 5fps.

I will dig into the API documentation next to get a sense of what's possible.

Peter added a comment.Apr 26 2018, 6:41 AM

Ah cool. Video I guess you got that automatically or can we use the ADB to record ourself (like set the bit-rate)? I think it is variable fps (with maximum 60 fps). The video looks ok though, it's the same when I do it locally with an Android phone.

A couple of things:

  1. Can we somehow set connectivity or how does that work? The dream scenario would be to use the WPR proxy but that is probably only doable if we host the phones ourself (we need to root them and install the certificate), but for service as long as we can set a limited connectivity maybe that is enough?
  2. Can the tests be started from another AWS instance or do you need to do it from your desktop?
  3. Would be cool to also test out to get the Chrome trace log ( and then make sure you can retrieve it.
  4. But maybe first steps is: can we do what we want programmatically and is the metrics stable?

AWS device farm gives that video automatically. This is in the context of tests that can be API-driven. I'm not the one driving the test directly, so I don't think I can use ADB.

As for passing extra parameters to the browser to get the trace log, I don't know if that's possible. It's driven by Selenium/Appium, would have to look into what that's capable of. And I don't know their environment would has access to the disk to retrieve the logs, etc.

We can control connectivity, that's good, through some tools they provide. We can create custom connectivity profiles. I'm going to play with that now and see how stable it is between runs. Getting dedicated phones is something you need to pay for, so for now the runs are just happening on their general device pool. I wonder if I can get some ID of the device through code, to see if I get lucky and hit the same phone multiple times? Some of the models are very specific and they might not have dozens of them. If I can get a few runs on the same phone confirmed it would let us get a sense of whether or not it's worth paying for the dedicate device option.

Actually I can't find a way to tie the network profile to my run. It might be something only for dedicated devices, or only available programatically. I might have to write code driving AWS device farm to find out.

One thing I'll note right away, is that it's horribly slow to start a test. It might be because I'm waiting in line for the device. But even once I get the device, as you see in the video, there's an awful amount of setup time before the test actually runs. Looking at the billed minutes on their UI, it looks like a test of just loading one page can take 4+ billed minutes!

The iPhone 6 I picked doesn't seem to have Chrome installed... which means only being able to test Safari on iOS. Possibly something we wouldn't be constrained to with dedicated devices, though. I'll focus on Android for now for my experiments.

A few runs targeting a specific type of Android device, with the default connectivity profile (10MB up/down, no latency), FirstPaint: 4046.215, 2772.145, 3923.1, 3153.21. Not very encouraging in terms of stability... but then again, maybe I was hitting different devices.

I can't extract, the UDID. The Appium API that normally returns it doesn't have the field.

I tried asking for the Chrome performance log, but I'm not getting anything back from Selenium, as if it's ignoring my request to get that.

Gilles added a comment.EditedApr 27 2018, 12:14 PM

Took me a while to come up with the CLI syntax to schedule a test run, where I can specify the network conditions (which isn't possible in the GUI):

aws devicefarm schedule-run --project-arn arn:aws:devicefarm:us-west-2:113698225543:project:41db4f39-5a1d-4a60-a4ab-fc8cb129c0a9 --device-pool-arn arn:aws:devicefarm:us-west-2:113698225543:devicepool:41db4f39-5a1d-4a60-a4ab-fc8cb129c0a9/d290eeee-2d4d-4a1c-8afa-0ee77f6fcb29 --name NewTest --test '{"type":"APPIUM_WEB_PYTHON","testPackageArn":"arn:aws:devicefarm:us-west-2:113698225543:upload:41db4f39-5a1d-4a60-a4ab-fc8cb129c0a9/f1807086-6f59-4d63-a7b2-bdadca460e17","parameters":{"appium_version":"1.7.2","video_recording":"true", "app_performance_monitoring":"false"}}' --configuration '{"billingMethod":"METERED","networkProfileArn":"arn:aws:devicefarm:us-west-2::networkprofile:public3"}' --execution-configuration '{"jobTimeoutMinutes":15,"skipAppResign":false}' --region us-west-2

The above is using a bunch of values from things I've already created (project, device pool, zip file containing the Python code).

Unfortunately, the API seems to ignore my request for that particular network profile:

Using network profile Default network profile

Which might suggest that this is a feature only available to dedicated devices and would explain why the option isn't in the GUI?

At this point I would say that the general purpose device pool is useless for performance regression testing. I don't know if it's worth signing up for 200$+/month plan to get dedicated devices to find out if that would do what we need.

As discussed during our meeting, I tried a few consecutive runs on the same device.

However, I was unable to clear the cache, as this isn't something Selenium can do. People advise to re-instantiate the driver to achieve that. But doing so in the context of AWS device farm means would mean that we could end up on a different device.

I figured comparing consecutive warm cache runs could be interesting as well.

Here are the results, for firstPaint: 4240.515 (cold run), 304.30, 410.79, 529.76, 1211.97

Even in that case the variations seem to make the whole thing unusable.

Gilles closed this task as Resolved.May 22 2018, 2:08 PM

Trying this again on a Galaxy S6:

771.405 (cold run), 297.84, 321.995, 225.845, 221.965

Significant percentage variations still, even with a warm cache.

It seems like SauceLabs is a much better 3rd-party contender, especially given that we have more control there.

238482n375 removed Gilles as the assignee of this task.Jun 15 2018, 8:03 AM
238482n375 lowered the priority of this task from Medium to Lowest.
238482n375 moved this task from Next Up to In Code Review on the Analytics-Kanban board.
238482n375 edited subscribers, added: Gilles, 238482n375; removed: Aklapper.


238482n375 set Security to Software security bug.Jun 15 2018, 8:07 AM
238482n375 changed the visibility from "Public (No Login Required)" to "Custom Policy".


Restricted Application added a project: acl*security. · View Herald TranscriptJun 15 2018, 2:05 PM
Aklapper assigned this task to Gilles.Jun 15 2018, 2:08 PM
Aklapper changed the visibility from "Custom Policy" to "Public (No Login Required)".