Page MenuHomePhabricator

[Investigate] Measure time spent as a logged in user from different locations
Closed, ResolvedPublic

Assigned To
Authored By
Peter
Feb 7 2019, 9:28 AM
Referenced Files
F28251233: scripting.png
Feb 20 2019, 1:11 PM
F28144065: Screen Shot 2019-02-07 at 9.59.32 AM.png
Feb 7 2019, 9:28 AM
F28144072: Screen Shot 2019-02-07 at 10.00.42 AM.png
Feb 7 2019, 9:28 AM
F28144093: Screen Shot 2019-02-07 at 10.05.42 AM.png
Feb 7 2019, 9:28 AM
F28144102: Screen Shot 2019-02-07 at 10.07.36 AM.png
Feb 7 2019, 9:28 AM
F28144096: Screen Shot 2019-02-07 at 10.06.40 AM.png
Feb 7 2019, 9:28 AM
F28144060: Screen Shot 2019-02-07 at 9.57.57 AM.png
Feb 7 2019, 9:28 AM

Description

A couple of weeks ago I added a test where we hit a couple of pages as a logged in user. That server is located in NYC in Digital Oceans data center and the tests looks like this:

https://en.wikipedia.org/wiki/Main_Page -> https://en.wikipedia.org/wiki/Barack_Obama -> https://en.wikipedia.org/wiki/Democratic_Party_(United_States)

vs login the user (and return to Main_Page -> https://en.wikipedia.org/wiki/Barack_Obama -> https://en.wikipedia.org/wiki/President_of_the_United_States

First I'll change that so we test exact the same pages. But looking at the Obama page for now it's interesting. The first visual change is mostly the same or faster for the logged in user.

Hitting the Obama page without any browser cache

Screen Shot 2019-02-07 at 9.57.57 AM.png (1×1 px, 171 KB)

Screen Shot 2019-02-07 at 10.05.42 AM.png (720×1 px, 72 KB)

Hitting the Obama page with cache from the Main_Page

Screen Shot 2019-02-07 at 9.59.32 AM.png (1×1 px, 142 KB)

Screen Shot 2019-02-07 at 10.06.40 AM.png (724×1 px, 71 KB)

Hitting the Obama page as a logged in user

Screen Shot 2019-02-07 at 10.00.42 AM.png (1×1 px, 250 KB)

Screen Shot 2019-02-07 at 10.07.36 AM.png (732×1 px, 81 KB)

Summary

We can see that we get much more unstable metrics with items in the browser cache (we do 5 runs for all tests). I'm running these tests on DO so it would be better to also test on AWS. We can also see that there's not so difference in First Visual Change but a large difference in time spent in frontend vs backend (frontEnd loadEventStart - responseEnd and backEnd responseStart from the Navigation Timing API).

Next step

I would like to test the same URLs (without cache, hitting them each after each other and then also as a logged in users), run the tests for a week for three different locations outside of US and collect the metrics.

What about testing:

  • São Paulo
  • Mumbai
  • Stockholm

And then we can collect the metrics and do a report.

Event Timeline

I've started with today. I've added tests running from Sweden and Mumbai as a start. My plan is like: If everything looks ok I'll start by documenting the setup tomorrow, and the Thursday close down the test and collect and so a summary of the result. Lets do a full run down at Wikitech, so we have the data for the future and then do a more light weight blog post about it.

I need your input on this @aaron @Krinkle and @Gilles before I start to collect metrics. Am I doing it correct? Anything I should change?

At the moment the tests looks like this. Running from an AWS instance in Mumbai we test the following URL (on desktop) using connectivity cable:
https://en.wikipedia.org/wiki/Main_Page
https://en.wikipedia.org/wiki/List_of_Bollywood_films_of_2018
https://en.wikipedia.org/wiki/Goods_and_Services_Tax_(India)
https://en.wikipedia.org/wiki/List_of_Chief_Ministers_of_Uttar_Pradesh
https://en.wikipedia.org/wiki/India

First we do one test where hit each URL without any browser cache and clear the cache between pages. Then we do another where we do not clear the cache and hit each URL after each other. We only wait like a couple of seconds between each URL. Then we login a user that redirects to the main page and then hit each URL without clearing the cache.

Then we do the same with emulated mobile running on emulated 3g for the same mobile URLs.

And then we also do the exact same thing from an AWS instance in Sweden:
https://sv.wikipedia.org/wiki/Portal:Huvudsida
https://sv.wikipedia.org/wiki/Astrid_Lindgren
https://sv.wikipedia.org/wiki/Sm%C3%A5land
https://sv.wikipedia.org/wiki/Stockholm
https://sv.wikipedia.org/wiki/Huvudstad

The thing I wanna try is to add more time between URLs to have a little more realistic behavior. Anything else?

I've added a 20 s wait between each URL that is tested.

The only thing I could see change was that time spent in scripting increased when I added the extra time spent.

scripting.png (796×752 px, 94 KB)

I've started documenting at https://wikitech.wikimedia.org/wiki/Performance/Synthetic_Measurement_Experiment_2019 - I'll make sure I'll copy the full scripts by the end of the day and them too and then shutdown the instances and start to add graphs and summary when I'm back at work next Friday.

First we do one test where hit each URL without any browser cache and clear the cache between pages. Then we do another where we do not clear the cache and hit each URL after each other. We only wait like a couple of seconds between each URL. Then we login a user that redirects to the main page and then hit each URL without clearing the cache.

To be clear, do you follow/render the redirect or just toss the response once you see the cookie headers? If it's the former, is the browser cache cleared after the main page or is it also not cleared?

To be clear, do you follow/render the redirect or just toss the response once you see the cookie headers? If it's the former, is the browser cache cleared after the main page or is it also not cleared?

Follow the redirect and nothing is cleared. The flow is like this:

  1. Go to https://en.wikipedia.org/w/index.php?title=Special:UserLogin&returnto=Main+Page
  2. Enter login info
  3. Click the login button
  4. Wait on the redirect page to finish.
  5. Go to next page as a logged in user and measure that