Page MenuHomePhabricator

Demonstrate impact of loading just lead sections of a group of static pages
Closed, ResolvedPublic5 Estimated Story Points

Description

Create static pages for the mobile site for the following pages (top 20 pages in Sept [1]):
https://en.wikipedia.org/wiki/Pablo_Escobar
https://en.wikipedia.org/wiki/Eazy-E
https://en.wikipedia.org/wiki/Hannah_Montana
https://en.wikipedia.org/wiki/Serena_Williams
https://en.wikipedia.org/wiki/Ice_Cube
https://en.wikipedia.org/wiki/Dr._Dre
https://en.wikipedia.org/wiki/Labor_Day
https://en.wikipedia.org/wiki/Narcos
https://en.wikipedia.org/wiki/Straight_Outta_Compton_(2015_film)
https://en.wikipedia.org/wiki/N.W.A
https://en.wikipedia.org/wiki/Venus_Williams
https://en.wikipedia.org/wiki/Welcome_Back_(film)
https://en.wikipedia.org/wiki/Coca-Cola_formula
https://en.wikipedia.org/wiki/Syrian_Civil_War
https://en.wikipedia.org/wiki/Suge_Knight
https://en.wikipedia.org/wiki/The_Visit_(2015_film)
https://en.wikipedia.org/wiki/Whitey_Bulger
https://en.wikipedia.org/wiki/Overview_effect
https://en.wikipedia.org/wiki/Metal_Gear_Solid_V:_The_Phantom_Pain
https://en.wikipedia.org/wiki/Aadesh_Shrivastava

(use wget to do this)

For each page create two clones of the page. In clone 1 remove any content post the lead section (e.g. when you encounter the first heading delete all html in #bodyContent), and in clone 2 scrub all the src and srcset attributes of images (a page without images but with content post lead section).

  • Upload the sets of two pages to a server that is accessible from webpagetest.org
  • Run tests against each set with the following conditions:
    • 2G
    • Dulles
    • Chrome
    • 9 test runs
    • first load only
  • Generate a table with columns article name, start render document complete time speed index `isJustLeadSection(true/false)
  • Post results here.

[1] List of pages is generated from https://gist.github.com/jobar/53db2d87461cf3137d03

Related Objects

StatusSubtypeAssignedTask
DeclinedNone
ResolvedCatrope
Resolved Sbailey
OpenReleaseNone
OpenNone
OpenNone
Resolved GWicke
Resolvedssastry
ResolvedNone
ResolvedDbrant
Resolved bearND
Resolved Mholloway
ResolvedNone
OpenNone
OpenFeatureNone
OpenNone
ResolvedArlolra
ResolvedMooeypoo
ResolvedCatrope
Resolved GWicke
ResolvedArlolra
Resolved marcoil
Resolved marcoil
Resolved GWicke
ResolvedJdforrester-WMF
DuplicateNone
Resolved bearND
OpenNone
ResolvedArlolra
DeclinedNone
DuplicateNone
DuplicateNone
DeclinedNone
DeclinedNone
DeclinedNone
ResolvedJdlrobson
Duplicate Peter

Event Timeline

Jdlrobson raised the priority of this task from to Needs Triage.
Jdlrobson updated the task description. (Show Details)
Jdlrobson subscribed.
Jdlrobson set Security to None.
Jdlrobson added a subscriber: Peter.

@Peter does this make sense as a good first step?

Looks cool! Think we can use our own instance http://wpt.wmftest.org/ because then we can start the tests using the command line and we can just fire away all 20 (or whatever we start with) and then collect the metrics, so it will not be problem. And then since it's kind automated, we can also test 3G or 3G fast (when the pages are up and running). We will then have one script file with all the URLs that we run, and then get the data in csv format, so we can use excel or whatever we want. We can then also test more than 9 times per URL and that is good.

I would love to help setting up the runs and collect the data. With just a a small fix we can use the current setup we have.

It is much work to fix the 20 articles, then I would start with ten and we test them to see the result and if feel we need more data then take 10 more.

About the columns, think we should change document complete time to fully loaded (that's what we use today and etsy.com is using that too in their report). Also want to include TTFB since that could be different between live and our test server.

Let me know when in time you can start and I will fix the current script so we easily can use to collect all the data.

Just to clarify, we're free to grab the pages and process/host 'em how we see fit just so long as they're accessible by the server(s) running http://wpt.wmftest.org?

Yup. They need to be accessible on the web and should be run on the same web host at the same time, to avoid discrepancies in things like first byte impacting results.

I wrote a script to generate the pages:
Output is here: https://github.com/jdlrobson/WikiArticleSampleGenerator
Someone should feel free to pick up where I left off and run them against webpage test. @bmansurov @phuedx ?

phuedx reassigned this task from phuedx to Jdlrobson.

Someone should feel free to pick up where I left off and run them against webpage test. @bmansurov @phuedx ?

@Peter: Is this a job for you? If not, then what's the magical incantation that we need to get our instance of WPT to do its thing?

So I want to run this for the following pages but have no idea how to use the script. @Peter I see you set something up for Thursday but any chance you could help me out async so I can get this done tomorrow (otherwise I will be forced to manually collate!) :-):
https://jonrobson.me.uk/T113649/0-a.html
https://jonrobson.me.uk/T113649/0-b.html
https://jonrobson.me.uk/T113649/0-c.html
https://jonrobson.me.uk/T113649/1-a.html
https://jonrobson.me.uk/T113649/1-b.html
https://jonrobson.me.uk/T113649/1-c.html
https://jonrobson.me.uk/T113649/2-a.html
https://jonrobson.me.uk/T113649/2-b.html
https://jonrobson.me.uk/T113649/2-c.html
https://jonrobson.me.uk/T113649/3-a.html
https://jonrobson.me.uk/T113649/3-b.html
https://jonrobson.me.uk/T113649/3-c.html
https://jonrobson.me.uk/T113649/4-a.html
https://jonrobson.me.uk/T113649/4-b.html
https://jonrobson.me.uk/T113649/4-c.html
https://jonrobson.me.uk/T113649/5-a.html
https://jonrobson.me.uk/T113649/5-b.html
https://jonrobson.me.uk/T113649/5-c.html
https://jonrobson.me.uk/T113649/6-a.html
https://jonrobson.me.uk/T113649/6-b.html
https://jonrobson.me.uk/T113649/6-c.html
https://jonrobson.me.uk/T113649/7-a.html
https://jonrobson.me.uk/T113649/7-b.html
https://jonrobson.me.uk/T113649/7-c.html
https://jonrobson.me.uk/T113649/8-a.html
https://jonrobson.me.uk/T113649/8-b.html
https://jonrobson.me.uk/T113649/8-c.html
https://jonrobson.me.uk/T113649/9-a.html
https://jonrobson.me.uk/T113649/9-b.html
https://jonrobson.me.uk/T113649/9-c.html
https://jonrobson.me.uk/T113649/10-a.html
https://jonrobson.me.uk/T113649/10-b.html
https://jonrobson.me.uk/T113649/10-c.html
https://jonrobson.me.uk/T113649/11-a.html
https://jonrobson.me.uk/T113649/11-b.html
https://jonrobson.me.uk/T113649/11-c.html
https://jonrobson.me.uk/T113649/12-a.html
https://jonrobson.me.uk/T113649/12-b.html
https://jonrobson.me.uk/T113649/12-c.html
https://jonrobson.me.uk/T113649/13-a.html
https://jonrobson.me.uk/T113649/13-b.html
https://jonrobson.me.uk/T113649/13-c.html
https://jonrobson.me.uk/T113649/14-a.html
https://jonrobson.me.uk/T113649/14-b.html
https://jonrobson.me.uk/T113649/14-c.html
https://jonrobson.me.uk/T113649/15-a.html
https://jonrobson.me.uk/T113649/15-b.html
https://jonrobson.me.uk/T113649/15-c.html
https://jonrobson.me.uk/T113649/16-a.html
https://jonrobson.me.uk/T113649/16-b.html
https://jonrobson.me.uk/T113649/16-c.html
https://jonrobson.me.uk/T113649/17-a.html
https://jonrobson.me.uk/T113649/17-b.html
https://jonrobson.me.uk/T113649/17-c.html
https://jonrobson.me.uk/T113649/18-a.html
https://jonrobson.me.uk/T113649/18-b.html
https://jonrobson.me.uk/T113649/18-c.html
https://jonrobson.me.uk/T113649/19-a.html
https://jonrobson.me.uk/T113649/19-b.html
https://jonrobson.me.uk/T113649/19-c.html

I thought using https://github.com/marcelduran/webpagetest-api would be easy enough but it doesnt seem to support connection speed 2G and will require me writing some code to automate this... I'm hopeful @Peter's script does what we need.... but I currently for life of me can't work out how to use it.

Okay. Method:
Cloned https://github.com/wikimedia/performance-WebPageTest
Cherry picked https://gerrit.wikimedia.org/r/#/c/243027/
Generated these two samples:


Set up environment variables

export WPT_CONNECTIVITY=3G
export WPT_RUNS=1

Ran node bin/index.js --batch scripts/batch/test.txt
Checked they ran: http://wpt.wmftest.org/testlog.php?days=1&filter=&all=on

export WPT_CONNECTIVITY=3G
export WPT_RUNS=9

I've started a test but it's very slow and I'm hoping my connection doesn't drop :)
It seems to do repeat view, and after talking to @Peter we discovered bugs that stop us from running them in Chrome and on 2G. Will report back with any findings on Firefox 3G when I have them.

It's also worth noting we uncovered a limitation with using static files for these sort of tests - the ResourceLoader startup module creates a script tag with a relative URL meaning that JavaScript will not load on our static pages - so these tests will only give us an indication of the page's performance minus any asynchronous JavaScript.

@Peter I came home to find to my surprise it was still running (on 3G). When I terminated it it was on page 16/20 now. We'll need to find a way to run these kind of tests quicker and simultaneously rather than sequentially I think... apologies, I didn't realise it would be running for so long.

Here are the results I collected for 16 of the 20 pages. I need to still analyse the results:
https://docs.google.com/spreadsheets/d/1INQ59Tczrg8MDZdjAq_36yawT3x733DgsEGQNe3AGtw/edit?usp=sharing

So I ran these tests for Firefox on 3G using Peter's tool. https://docs.google.com/spreadsheets/d/1INQ59Tczrg8MDZdjAq_36yawT3x733DgsEGQNe3AGtw/edit?usp=sharing

I also ran a bunch of tests against webpagetest using a custom node script I wrote (I'll post that up soon)
http://www.webpagetest.org/testlog.php?days=14&filter=jonrobson.me.uk&all=on&nolimit=on

I'm not sure however I can jump to conclusions based on this data apart from the fact that I'm not seeing any obvious negative impacts on performance.

Issues with using my server:

  • Seems to have https issues which skewed results
  • I also ran them using my script on Chrome 2G using Webpage test but many of the tests were useless given they brought my server down by by sending too many concurrent requests within such a short space of time.
  • Running these tests sequentially played havoc with time to first byte... time to first byte varied dramatically, meaning a lot of tests were not so useful. These tests should be re-run on something where first byte is consistent.

Issues with using https://github.com/wikimedia/performance-WebPageTest:

  • CSV outputted by Peter's tool doesn't come with the test job so makes it difficult to debug
  • Running sequentially is super slow (took 8 hours to run 16*3 pages 9 runs on 3G)

Issues encountered when using webpagetest.org and my node script:

Lessons learnt:

  • Need to run these tests on a more stable server where time to first byte is more consistent
  • Need to run static pages inside a MediaWiki instance so that ResourceLoader URLS do not 404 and closer to stable.
  • Need to find a way to do these jobs more quickly

Given I've spent way too much time on this, this sprint - I'm thinking I should pull together a bunch of lessons learnt and we should re-attempt this in a future sprint. @Peter any thoughts on how this could be improved a second time round? This methodology doesn't seem to be working but there seems like there is something hopeful in the data!

I think it would be good to just start of with a couple of pages to test, maybe three, so it isn't so much work to get it up and running the next time. Then if we see that it is promising, we can start testing at a larger scale. If you need something stable/fast first byte, I've been using Digital Ocean with nginx, that is fast to get up and running. If you serving static content. But think it's important to investigate that we test things realistic = is it this way our final solution will look like.

About testing taking long time: Yep we only use one agent at the moment, we can change that moving on but I don't think it's super important at the moment if the tests takes a couple of hours (as long as we test the right things).

Let me see how we can get cleaner results in the CSV files, I'll made some changes.

https://www.mediawiki.org/wiki/Reading/Web#Performance

  • Seemed to be a few bugs in the csv generated for the most recent report I did, for just 3 variants of 2 pages
  • Have proposed we do this again next sprint in T115073 for 5 pages (just lead section vs full article)