Page MenuHomePhabricator

[EPIC] Only load the lead section on page load
Closed, DeclinedPublic

Description

For casual users who only use the Wikipedia article to identify an image of the food they are looking up or get a brief overview of a subject, any content post the lead section is not useful. Let's defer load this until the user scrolls down the page.
This will dramatically increase the time it takes for the user to load a fully functional page.

To quote @Gilles on T97570:
"The Barack Obama article might be a bit of an extreme example due to its length, but in that case the API data needed for section 0's text + the list of sections is almost 30 times smaller than the data needed for all sections's text (5.9kb gzipped versus 173.8kb gzipped)."

We need to find a non-Js fallback for this.
Ideas:

  • lite.wikipedia.org
  • link to desktop
  • mobileaction=fullsite - a mode which shows full content without js

For completeness such a change would require a method to render content on the client: https://gerrit.wikimedia.org/r/#/c/219489
and would also need a new api method for retrieving all sections other than the lead section: https://gerrit.wikimedia.org/r/#/c/219490 (copying Pythons array slice notation might be a good idea for parameters as this evolves).

https://gerrit.wikimedia.org/r/#/c/219490/

Event Timeline

Jdlrobson raised the priority of this task from to Needs Triage.
Jdlrobson updated the task description. (Show Details)

Change 212945 had a related patch set uploaded (by Jdlrobson):
Alpha: Serve only lead section on page load

https://gerrit.wikimedia.org/r/212945

@Gilles what are your current thoughts on this? Should I get it to a mergeable state and should we experiment with it for a week in beta to see what benefits it brings?

Yeah, getting this all the way to beta would be great!

Patches are fixed up. Just need some review.

Change 212945 abandoned by Jdlrobson:
Beta: Serve only lead section on page load

Reason:
Abandoning for time being given the e-mail with the clever title ;-) "Notes from performance meeting: PERFecting the reading experience"

We can revisit this later.

https://gerrit.wikimedia.org/r/212945

phuedx changed the task status from Open to Stalled.Jul 7 2015, 3:42 PM
phuedx subscribed.

@Jdlrobson It's totally possible to get the rest of the sections from the mobileview api, with parameters like this: https://github.com/joakin/webkipedia/blob/master/src/webkipedia/api/mobileview_article.cljs#L20-L29

We could consider doing this on beta before the quarter ends as an experiment and measure load time and see how it affects user behavior. Would it be interesting?

I think we'd learn from it and the code is ready to ship https://gerrit.wikimedia.org/r/212945
Whether we can accurately test is a separate concern though - currently no way that I know of to use NavigationTiming to show only beta traffic.

We shouldn't do this until we can measure how useful it is and if it is having any impact on engagement (good or bad), as much as I'd like to see it.

Not very sure how I would phrase the blocking task though. It could make sense to differentiate metrics based on platform mode (stable, beta) but it would be even better if we could differentiate based on flags so that we could measure variants regardless of mode (imagine running normal page view and only-lead-image page view at the same time on stable with bucketed users, we should be able to see the metrics of those two).

I guess it should also involve #event-logging since with the performance metrics we'd want to look also at the event logged stats to see the impact of the test on other metrics (like engagement, page views, etc).

Help? Do we have ways of doing this? How do we ping?

@Jhernandez: Perhaps @kevinator would be able to answer your questions.

*nudge* :)

Based on recent perf audit [1], this change should have dramatic performance benefits for users on slower connections... Anything we can do to help unblock this?

[1] https://docs.google.com/document/d/1qWIYN-0ZVpSDiM5RChYKp_5n_fGZkHEtAeZ0ufUTQAk/edit

Help? Do we have ways of doing this? How do we ping?

I'd suggest a very simple approach. A/B test this by measuring time spent on the article once it appears (i.e. time loading stuff while the view is blank shouldn't count). Report figures to statsd via statsv. Hook it up to a grafana dashboard.

It would only need to be applied to a small amount of requests and compared to a control group of the same size.

If that shows promise, the next steps depend on the level of caution you want to apply. The Design-Research team is there to help for qualitative UX testing. I'd recommended reaching out to them to see if they have free bandwidth for this.

As for quantitative tests, I think that expanding the A/B testing code to cover things across pageviews (like making the flag sticky for a session and counting pageviews) start to be quite complicated for what they're worth. While you shouldn't be too afraid to do it live on small wikis known to like change, experience has shown us that it is a very poor indicator of community reception on enwiki/dewiki. However design research was a good indicator, imho.

Once fully ready for the big launch, given how easily that feature can be ramped up and down to a ratio of users, it should be released that way to enwiki/dewiki. Doing so also has the advantage of making any potential community criticism more manageable. I.e. only a subset of users would be affected, instead of upsetting everyone who might be at the same time.

Firstly thanks @igrigorik for poking this :) I've been shouting about doing this for 3 years now at the top of my lungs so I think I have a good overview of what's blocking this.

  • No suitable non-js fallback for users who want all content / users with slow connections / older browsers.
  • There could be damage to SEO [citation needed].
  • We need to be able to accurately measure the impact of such a change to communicate to our community why we are doing such a thing to minimise backlash (the thumb down on this task indicates not everyone favours it - we've started discussions around this in T112588 ). It's not clear where we should run the tests and under what conditions and what pages. Implementing this as a feature flagged modification on production is not a trivial change to make.
  • Ensuring expansion of sections is triggered by fragments for content that doesn't exist (our API doesn't make this easy)
  • More reliance on our legacy MobileFormatter code [1] which was originally just intended as a short term hack. The APIs @GWicke is building might be better to work off of.
  • We'd need to expose references via the API and rewrite how they work.
  • @JakeA suggested chunked encoding would be a better solution https://twitter.com/jaffathecake/status/606507456416649217 - it would be good to compare the two

A cheap first step might be to explore using a service worker since this avoids us having to worry about a lot of these problems albeit this won't help the first load but I want us to be 100% sure we have a good way to measure it.

[1] http://git.wikimedia.org/blob/mediawiki%2Fextensions%2FMobileFrontend.git/2cf17f2ce7861c7d2a5b4824f08833f7b91bdd31/includes%2FMobileFormatter.php

@Gilles @Jdlrobson some thoughts and followup questions based on the above..

Is there any existing analytics for if and how many sections are expanded by the visitors? If not, this seems like the place to start. My guess is that there is a significant number of visitors that never expand most (any?) of the sections beyond the header. These users incur a delayed load and a significant (and unnecessary) transfer cost for data that they never see or use -- hundreds of KB's on slow and costly mobile connection is a big deal, which is most every mobile user in many (rapidly growing) markets.

Re, fallbacks: there are many ways to implement this.. You don't have to rely on JS or fetching chunks. It may be completely reasonable to show the head and offer the "full page" as a separate "click here for more" experience.

Re, chunking: this is orthogonal. First off, you're already "chunking" the HTML response (you're on HTTPS and have SPDY enabled, all responses are already streamed in 'chunks'; every spdy/h2 stream is 'chunked'). Second, chunking doesn't address the fact that the client may be fetching a lot of data it may not need (user never expands the section), and the browser has to do a lot more work to process and display this data (most of which is display:none'd). If the argument is that you should optimizing "progressive rendering"... then, yeah, sure, except you still have the same issue of large HTML resource competing for bandwidth with CSS and JS.. so we're back to reducing the size of the HTML resource.

FWIW, here's a real-world example of potential wins to be had here:

compare.png (626×1 px, 199 KB)

The above filmstrip compares the current mobile site vs. transcoded version [1,2] for slow 2G users (see preview). In effect, the transcoded version does exactly what we're discussing here (and a bit more): it aims to deliver all of the critical assets in first ~1~2 RTTs by limiting amount of HTML and inlining CSS and JS, and loads other sections on demand as the user scrolls down the page...

Our experiments show that optimized pages load four times faster than the original page and use 80% fewer bytes. As our users’ overall experience became faster, we saw a 50% increase in traffic to these optimized pages.

Above results are consistent with the numbers we see in Chrome for many "emerging" markets, where many users abandon pages (!) before they even get to first-paint because of all the heavy and blocking assets... plus, the associated data costs.

[1] http://googlewebmastercentral.blogspot.com/2015/04/faster-and-lighter-mobile-web-pages-for.html
[2] https://support.google.com/webmasters/answer/6211428?hl=en

Firstly no need to convince me, I know this will make a huge impact and it gets me excited to see momentum growing around this idea :).

@Gilles @Jdlrobson some thoughts and followup questions based on the above..

Is there any existing analytics for if and how many sections are expanded by the visitors? If not, this seems like the place to start. My guess is that there is a significant number of visitors that never expand most (any?) of the sections beyond the header. These users incur a delayed load and a significant (and unnecessary) transfer cost for data that they never see or use -- hundreds of KB's on slow and costly mobile connection is a big deal, which is most every mobile user in many (rapidly growing) markets.

There is some very old data from 2012 [1] but it wasn't complete as our analytics infrastructure was in its infancy there but yes we should definitely test this again. All it suggested to us was the further down the page the section the less likely it gets read. It didn't take into account visits without opening a section. We should probably run this again now our analytics systems are more mature on a sample of users to get an idea of what percentage of page views open a section to cover our backs.

Re, fallbacks: there are many ways to implement this.. You don't have to rely on JS or fetching chunks. It may be completely reasonable to show the head and offer the "full page" as a separate "click here for more" experience.

Yup. I'm aware there are lots of options, I'm just flagging it as something we'll have to think about in particular how it impacts our parser cache.

Our experiments show that optimized pages load four times faster than the original page and use 80% fewer bytes. As our users’ overall experience became faster, we saw a 50% increase in traffic to these optimized pages.

Although this seems to be pretty obvious, can you share these experiments? Are they public? They will be good to point at when justifying such a change to an angry user editor thinks we are not giving their content the prominence it deserves.

Above results are consistent with the numbers we see in Chrome for many "emerging" markets, where many users abandon pages (!) before they even get to first-paint because of all the heavy and blocking assets... plus, the associated data costs.

Yeh I'm not surprised.

[1] https://www.mediawiki.org/w/index.php?title=Event_logging/Mobile&oldid=619140#Section_toggles

Change 212945 restored by Jdlrobson:
Beta: Serve only lead section on page load

https://gerrit.wikimedia.org/r/212945

Change 239508 had a related patch set uploaded (by Jdlrobson):
WIP: Beta: Serve only lead section on page load

https://gerrit.wikimedia.org/r/239508

Change 212945 abandoned by Jdlrobson:
Beta: Serve only lead section on page load

https://gerrit.wikimedia.org/r/212945

Our experiments show that optimized pages load four times faster than the original page and use 80% fewer bytes. As our users’ overall experience became faster, we saw a 50% increase in traffic to these optimized pages.

Although this seems to be pretty obvious, can you share these experiments? Are they public? They will be good to point at when justifying such a change to an angry user editor thinks we are not giving their content the prominence it deserves.

I'm quoting results from the blog post here: http://googlewebmastercentral.blogspot.com/2015/04/faster-and-lighter-mobile-web-pages-for.html - sadly, can't share much more, as I wasn't the one running them. That said, the results are based on experiments with real-world traffic + users.

Thanks for the link. We're slowly coming up with a plan for the next 3-6 months so you may be interested in helping us spec out T113066. I suspect we're going to have to throw away a lot of legacy cruft in favour of better services to get there though (so hence the timeline).

We'd need to expose references via the API and rewrite how they work.

We could also inline the necessary metadata for the reference, which would allow you to present it instantly w/o doing a RT to fetch all the references (which can be hundreds in some cases).

Change 243353 had a related patch set uploaded (by Jdlrobson):
WIP: Beta: Serve only lead section on page load

https://gerrit.wikimedia.org/r/243353

Change 239508 abandoned by Jdlrobson:
WIP: Beta: Serve only lead section on page load

Reason:
See https://gerrit.wikimedia.org/r/#/c/243353/

https://gerrit.wikimedia.org/r/239508

Change 243353 abandoned by Jdlrobson:
WIP: Beta: Serve only lead section on page load

Reason:
Laterz

https://gerrit.wikimedia.org/r/243353

From our meeting during Q3 planning:
We'll have to be especially careful around deep linking (hash fragments) if we go down this route. The way I see it we have a few options:

  1. Load only the lead section first, then immediately load the rest of the page and scroll to section (imo bad).
  2. Load the whole page at once and scroll; current behavior on production (imo pretty good).
  3. Load only the lead section and the section from the hash fragment (imo probably bad, but could use metrics).

I'm not certain about the performance and/or technical feasibility of any of these options, but these were what popped into my head during our brainstorming.

Jdlrobson moved this task from 2017-18 Q1 to 2016-17 Q1 on the Web-Team-Backlog board.
Jdlrobson moved this task from 2016-17 Q1 to 2015-16 Q4 on the Web-Team-Backlog board.
Jdlrobson moved this task from 2015-16 Q4 to Epics/Goals on the Web-Team-Backlog board.
Jdlrobson renamed this task from Only load the lead section on page load to [EPIC] Only load the lead section on page load.May 21 2017, 11:43 AM
Jdlrobson closed this task as Declined.

Reflecting reality. We can revisit later.