Demonstrate impact of loading just lead sections of a group of static pages
Closed, ResolvedPublic5 Estimated Story Points
Actions

Assigned To

Authored By

	Jdlrobson
	Sep 24 2015, 7:40 PM

Description

(use wget to do this)

For each page create two clones of the page. In clone 1 remove any content post the lead section (e.g. when you encounter the first heading delete all html in #bodyContent), and in clone 2 scrub all the src and srcset attributes of images (a page without images but with content post lead section).

Upload the sets of two pages to a server that is accessible from webpagetest.org
Run tests against each set with the following conditions:
- 2G
- Dulles
- Chrome
- 9 test runs
- first load only
Generate a table with columns article name, start render document complete time speed index `isJustLeadSection(true/false)
Post results here.

[1] List of pages is generated from https://gist.github.com/jobar/53db2d87461cf3137d03

Related Objects
Search...

Status	Subtype	Assigned	Task
Declined		None	T67243 Flow: Obey the Exlinks gadget (open external links in new tab/window)
Resolved		Catrope	T96855 A15. Flow rendering doesn't render external links correctly
Resolved		• Sbailey	T58756 Parsoid doesn't give external links class="external free\|text"
Open	Release	None	T84936 Release VisualEditor-MediaWiki as "1.0"
Open		None	T50429 [Epic] Support editing parts of a page in VisualEditor-MediaWiki
Open		None	T54365 Explore performance gains from progressive (JIT?) de-alienation in VisualEditor
Resolved		• GWicke	T55783 Use CSS-based citation numbering
Resolved		ssastry	T64511 Mass rendering tests / visual diff
Resolved		None	T62017 Flow: Reference numbers in Parsoid output aren't superscripted
Resolved		Dbrant	T115488 Integrate the app fully with the Content Service.
Resolved		• bearND	T108777 Use Parsoid for new mobile-html-section routes
Resolved		• Mholloway	T161008 Link preview offline error message is shown for missing pages that return a 404 HTTP status
Resolved		None	T119266 Red links displayed as regular links when Restbase enabled.
Open		None	T174303 Copy-pasting linked ISBN numbers from view mode HTML into VisualEditor inserts wikitext links to Special:BookSources (it should turn them into magic links?)
Open	Feature	None	T54091 The read HTML should have hinting to allow full DOM copying (as opposed to just rich copying) from read mode into VE surfaces
Open		None	T55784 [EPIC] Use Parsoid HTML for all page views
Resolved		Arlolra	T53245 Link MediaWiki styles and create Parsoid-specific CSS styling to match MediaWiki's for differing DOM elements
Resolved		Mooeypoo	T55436 VisualEditor: Style Parsoid's <figure>s to look like MediaWiki's <div class="thumb">s rather than replacing them
Resolved		Catrope	T55505 VisualEditor: Provide a way for users to edit auto-numbered external links
Resolved		• GWicke	T55432 Typeof cleanup and smart serialization of new language and interwiki links
Resolved		Arlolra	T69540 Produce/preserve the metadata about additional ResourceLoader modules required by extension tags
Resolved		• marcoil	T73490 Parsoid should set the prop parameter when calling API action=expandtemplates
Resolved		• marcoil	T86902 Improve Parsoid's loading of CSS modules using ResourceLoader
Resolved		• GWicke	T68287 Thumb CSS misses background color and border on caption
Resolved		Jdforrester-WMF	T40726 VisualEditor: Links should follow the local CSS rules as if they were rendered in the view mode (e.g. external links shown with the external icon)
Duplicate		None	T35084 VisualEditor: Interproject links are coloured correctly (lighter blue), but also have an external link icon
Resolved		• bearND	T117505 Copy device independent transformations from Android app to service
Open		None	T168555 Investigate what to do with WikiLinkFixer now that Parsoid supports red links
Resolved		Arlolra	T39902 RFC: Implement rendering of redlinks in Parsoid HTML as post-processor
Declined		None	T98145 [EPIC] Lead images (WikidataPageBanner integration) to stable
Duplicate		None	T91712 Move lead paragraph above other article content
Duplicate		None	T65134 Images are too small in MobileFrontend
Declined		None	T109703 Collapse infoboxes by default / make it easier to skip infoboxes
Declined		None	T125920 [EPIC] Future exciting reading web performance endeavours
Declined		None	T101046 [EPIC] Use MCS as parser for main content in mobile web
Resolved		Jdlrobson	T113649 Demonstrate impact of loading just lead sections of a group of static pages
Duplicate		• Peter	T114216 Support CSV output from the WPT wrapper script

Event Timeline

Jdlrobson created this task.Sep 24 2015, 7:40 PM

Jdlrobson raised the priority of this task from to Needs Triage.

Jdlrobson updated the task description. (Show Details)

Jdlrobson added a project: Reading-Web-Sprint-57-The Fifth Element.

Jdlrobson moved this task to Needs Analysis on the Reading-Web-Sprint-57-The Fifth Element board.

Jdlrobson subscribed.

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptSep 24 2015, 7:40 PM

@Peter does this make sense as a good first step?

Jdlrobson edited a custom field.Sep 25 2015, 5:13 PM

Jdlrobson updated the task description. (Show Details)Sep 29 2015, 4:34 PM

Jdlrobson edited a custom field.

Jdlrobson moved this task from Needs Analysis to To Do on the Reading-Web-Sprint-57-The Fifth Element board.

Looks cool! Think we can use our own instance http://wpt.wmftest.org/ because then we can start the tests using the command line and we can just fire away all 20 (or whatever we start with) and then collect the metrics, so it will not be problem. And then since it's kind automated, we can also test 3G or 3G fast (when the pages are up and running). We will then have one script file with all the URLs that we run, and then get the data in csv format, so we can use excel or whatever we want. We can then also test more than 9 times per URL and that is good.

I would love to help setting up the runs and collect the data. With just a a small fix we can use the current setup we have.

It is much work to fix the 20 articles, then I would start with ten and we test them to see the result and if feel we need more data then take 10 more.

About the columns, think we should change document complete time to fully loaded (that's what we use today and etsy.com is using that too in their report). Also want to include TTFB since that could be different between live and our test server.

Let me know when in time you can start and I will fix the current script so we easily can use to collect all the data.

Just to clarify, we're free to grab the pages and process/host 'em how we see fit just so long as they're accessible by the server(s) running http://wpt.wmftest.org?

Yup. They need to be accessible on the web and should be run on the same web host at the same time, to avoid discrepancies in things like first byte impacting results.

https://phabricator.wikimedia.org/T114216#1690408 might be able to help here!

Jdlrobson updated the task description. (Show Details)Oct 1 2015, 11:34 PM

Jdlrobson moved this task from To Do to Doing on the Reading-Web-Sprint-57-The Fifth Element board.Oct 1 2015, 11:36 PM

I wrote a script to generate the pages:
Output is here: https://github.com/jdlrobson/WikiArticleSampleGenerator
Someone should feel free to pick up where I left off and run them against webpage test. @bmansurov @phuedx ?

Jdlrobson moved this task from Doing to To Do on the Reading-Web-Sprint-57-The Fifth Element board.Oct 2 2015, 12:22 AM

Jdlrobson moved this task from To Do to Doing on the Reading-Web-Sprint-57-The Fifth Element board.Oct 2 2015, 3:46 PM

phuedx claimed this task.Oct 5 2015, 12:52 PM

phuedx reassigned this task from phuedx to Jdlrobson.

Someone should feel free to pick up where I left off and run them against webpage test. @bmansurov @phuedx ?

@Peter: Is this a job for you? If not, then what's the magical incantation that we need to get our instance of WPT to do its thing?

I thought using https://github.com/marcelduran/webpagetest-api would be easy enough but it doesnt seem to support connection speed 2G and will require me writing some code to automate this... I'm hopeful @Peter's script does what we need.... but I currently for life of me can't work out how to use it.

Jdlrobson added a parent task: T101046: [EPIC] Use MCS as parser for main content in mobile web.Oct 6 2015, 6:17 PM

Okay. Method:
Cloned https://github.com/wikimedia/performance-WebPageTest
Cherry picked https://gerrit.wikimedia.org/r/#/c/243027/
Generated these two samples:

test.txt509 BDownload

leads.txt15 KBDownload

Set up environment variables

export WPT_CONNECTIVITY=3G
export WPT_RUNS=1

Ran node bin/index.js --batch scripts/batch/test.txt
Checked they ran: http://wpt.wmftest.org/testlog.php?days=1&filter=&all=on

export WPT_CONNECTIVITY=3G
export WPT_RUNS=9

I've started a test but it's very slow and I'm hoping my connection doesn't drop :)
It seems to do repeat view, and after talking to @Peter we discovered bugs that stop us from running them in Chrome and on 2G. Will report back with any findings on Firefox 3G when I have them.

It's also worth noting we uncovered a limitation with using static files for these sort of tests - the ResourceLoader startup module creates a script tag with a relative URL meaning that JavaScript will not load on our static pages - so these tests will only give us an indication of the page's performance minus any asynchronous JavaScript.

@Peter I came home to find to my surprise it was still running (on 3G). When I terminated it it was on page 16/20 now. We'll need to find a way to run these kind of tests quicker and simultaneously rather than sequentially I think... apologies, I didn't realise it would be running for so long.

Here are the results I collected for 16 of the 20 pages. I need to still analyse the results:
https://docs.google.com/spreadsheets/d/1INQ59Tczrg8MDZdjAq_36yawT3x733DgsEGQNe3AGtw/edit?usp=sharing

So I ran these tests for Firefox on 3G using Peter's tool. https://docs.google.com/spreadsheets/d/1INQ59Tczrg8MDZdjAq_36yawT3x733DgsEGQNe3AGtw/edit?usp=sharing

I also ran a bunch of tests against webpagetest using a custom node script I wrote (I'll post that up soon)
http://www.webpagetest.org/testlog.php?days=14&filter=jonrobson.me.uk&all=on&nolimit=on

I'm not sure however I can jump to conclusions based on this data apart from the fact that I'm not seeing any obvious negative impacts on performance.

Issues with using my server:

Seems to have https issues which skewed results
I also ran them using my script on Chrome 2G using Webpage test but many of the tests were useless given they brought my server down by by sending too many concurrent requests within such a short space of time.
Running these tests sequentially played havoc with time to first byte... time to first byte varied dramatically, meaning a lot of tests were not so useful. These tests should be re-run on something where first byte is consistent.

Issues with using https://github.com/wikimedia/performance-WebPageTest:

CSV outputted by Peter's tool doesn't come with the test job so makes it difficult to debug
Running sequentially is super slow (took 8 hours to run 16*3 pages 9 runs on 3G)

Issues encountered when using webpagetest.org and my node script:

API requests get capped so I can't generate all results easily
User time missing from some jobs and start render 0 on first view e.g. http://www.webpagetest.org/result/151006_7B_22H/ - I think because of 503 errors on my server.

Lessons learnt:

Need to run these tests on a more stable server where time to first byte is more consistent
Need to run static pages inside a MediaWiki instance so that ResourceLoader URLS do not 404 and closer to stable.
Need to find a way to do these jobs more quickly

Given I've spent way too much time on this, this sprint - I'm thinking I should pull together a bunch of lessons learnt and we should re-attempt this in a future sprint. @Peter any thoughts on how this could be improved a second time round? This methodology doesn't seem to be working but there seems like there is something hopeful in the data!

I think it would be good to just start of with a couple of pages to test, maybe three, so it isn't so much work to get it up and running the next time. Then if we see that it is promising, we can start testing at a larger scale. If you need something stable/fast first byte, I've been using Digital Ocean with nginx, that is fast to get up and running. If you serving static content. But think it's important to investigate that we test things realistic = is it this way our final solution will look like.

About testing taking long time: Yep we only use one agent at the moment, we can change that moving on but I don't think it's super important at the moment if the tests takes a couple of hours (as long as we test the right things).

Let me see how we can get cleaner results in the CSV files, I'll made some changes.

Jdlrobson mentioned this in T115073: [6 hours] Test a subset of pages with only a lead section [take 2].Oct 9 2015, 12:24 AM

Wrote up https://www.mediawiki.org/wiki/Reading/Web/Performance/How_to_test_static_pages
Linked to from here:

https://www.mediawiki.org/wiki/Reading/Web#Performance

Seemed to be a few bugs in the csv generated for the most recent report I did, for just 3 variants of 2 pages
Have proposed we do this again next sprint in T115073 for 5 pages (just lead section vs full article)

Jdlrobson moved this task from Doing to Code Review on the Reading-Web-Sprint-57-The Fifth Element board.Oct 9 2015, 12:47 AM

phuedx moved this task from Code Review to Ready for Signoff on the Reading-Web-Sprint-57-The Fifth Element board.Oct 9 2015, 2:18 PM

Any issues with https://phabricator.wikimedia.org/T113649#1714257 before signing off?

Not from me.

Jdlrobson moved this task from Ready for Signoff to Done on the Reading-Web-Sprint-57-The Fifth Element board.Oct 9 2015, 7:16 PM

	F2662559: leads.txt
	Oct 6 2015, 9:15 PM

	F2662554: test.txt
	Oct 6 2015, 9:15 PM

Demonstrate impact of loading just lead sections of a group of static pagesClosed, ResolvedPublic5 Estimated Story PointsActions

Description

Related ObjectsSearch...

Event Timeline

Demonstrate impact of loading just lead sections of a group of static pages
Closed, ResolvedPublic5 Estimated Story Points
Actions

Related Objects
Search...