Page MenuHomePhabricator

Explore the improvement on first paint / load time on connections if we load images via JS
Closed, ResolvedPublic

Description

Run a test for yourself using the settings:

setCookie https://en.m.wikipedia.org disableImages=1
navigate https://en.m.wikipedia.org/wiki/Barack_Obama

Update 22nd December

Originally T110615 suggested removing images would halve first paint for 2G connections.

However, I've booted up a new test to get some fresh results
http://www.webpagetest.org/result/151222_FS_15H3/ vs http://www.webpagetest.org/result/151222_Y3_15EH/ which shows similar results.

I enabled the cookie with values 0 on 1 so that they are subject to similar caching rules.

With images disabled via cookie:
First paint: 68.973s
Bytes: 385 KB

With images enabled via cookie:
First paint: 63.379s
Bytes: 970 KB

August 2015

On 2G the Barack Obama page hits first paint at 46.131s with images and 19.794s without images [2]. In the former the page doesn't fully load, however in the latter it does (albeit sans images). This is pretty huge.

Daring proposal: We currently allow the disabling of images via the enwikidisableImages cookie - why not do it for all page views. (hear me out...)
Currently the disable images mode replaces images with a link to the file page for the image (mw-mf-image-replacement class). If we were to also decorate this with a data attribute pointing to the source of the image, we could load these images via JavaScript at the bottom of the page.
A more complete solution would only do this for images beyond the lead section (given that the first image is likely to be the only one the user needs to see to feel like the page is responsive).

Given the images would then be loaded after the startup module, things could get ridiculously more fast.
Proposal: Let's try this out and test it and report back findings.
Let's avoid a debate about whether this is considered breaking the web at this point - let's just work out what it buys us.

Changes required:

  • With a flag enabled e.g query string parameter AND disableImages cookie enabled any disabled images are pulled in via JS (when scrolled to - see getBoundingClientRect / debouncing)

We'll get this deployed and run some tests across a collection of pages in both modes and report back what we find

[1] http://www.webpagetest.org/result/150825_9V_1AYZ/
[2] http://www.webpagetest.org/result/150827_ZB_1A46/

Event Timeline

Jdlrobson raised the priority of this task from to High.
Jdlrobson updated the task description. (Show Details)

Do we know why the first-paint is faster without images?

Is there a layout problem (eg, images missing width/height, causing delayed layout until the images are loaded), or is loading of images simultaneous with the multi-megabyte HTML slowing down download/parsing/layout of the base document?

(We really shouldn't ship multiple megs of HTML over 2G when most of the time only a small portion will be read, but that's another debate.)

@Jdlrobson I think this could be very interesting to explore next sprint. I'm moving it there, feel free to reprioritize if it doesn't fit.

@brion I think from the performance audit (T105365 see bottom comments) that the loading of the images is filling up the browser parallel requests quota effectively blocking loading styles and/or js, and thus delaying the first paint.

It would be great to clarify exactly why this is happening if what I've said ^ is not why this is happening.

On 2G in addition to hitting the parallelization limit if there is one, images are competing for scarce bandwidth at that stage of the pageload where JS and CSS is trying to load. The browser preloader picks up src attributes from img tags and we have no control over how it prioritizes assets.

What @Gilles said.
So yeh I'm super keen to explore the impact of a new version of images disabled which disables images until they are scrolled into view. I think this would be huge for 2G connections whilst we still use ResourceLoader.

Do we have an upstream bug report with the browser vendor? Images are non-blocking resources and shouldn't be competing with blocking JS/CSS...

Images are non blocking but when run alongside a script tag it slows it down.

Well we use ResourceLoader... We use this to only load script tags conditionally when we support the browser. As a result the browser has no idea of the script tags that it will have to download in future.

Thus this is not a browser vendor issue it's a problem with our stack...

Aren't asynchronous scripts non-blocking by definition? How can they be blocking paint?

Sorry I wrote that in a hurry and I'm mixing concerns here. Sorry for confusion.

The first paint is slowed down as the link tag is downloaded alongside multiple img tags:
http://www.webpagetest.org/video/compare.php?tests=150825_9V_1AYZ-r:1-c:0
I guess this could be called an upstream bug... but most of the time this is what you want (it's only image heavy pages that get impacted)

... From the same graphic you'll see that we don't even get past the loading of the startup script as the script tag is downloaded in parallel to all the images. As a result we do not load further scripts/styles via ResourceLoader so we are also delaying the full load time / interactive time. It's worth noting client-nojs class gets removed in this script which may trigger a repaint/first paint depending on styles in the stylesheet.

So if we were instead to get the startup script out of the way and _then_ start loading images or load images as required we reduce the amount of HTTP going on so it could potentially be better for everyone. This is the hypothesis anyway..

cc @Peter who may be interested in this conversation

Checked out the WPT-links. One is tested using Nexus 5 2g and the other emulated mobile for Chrome using edge, it's kind of evil to test on different devices and speed and compare them :)

To get some better numbers I think we need to test more pages and more than 9 times per URL. For example, I've tested now emulated mobile using Chrome using 3gfast and my median for start render without images is 0.490 s (http://www.webpagetest.org/result/150911_9M_C2N/) and removing images gives me 2.895 s (http://www.webpagetest.org/result/150911_Z7_BW4/). I've just scripted navigation on both to make sure that isn't a issue. However, I think the numbers reported by WebPageTest is over optimistic for start render with images. If you compare the different render images, it looks like that start render happens way slower than 0.5 s but still a little before if you turn off images. You can also check http://www.webpagetest.org/video/compare.php?tests=150911_9M_C2N-r:4-c:0 and the graph Visual Progress - there it's more visible when the real content appear on screen.

One more things: Saw that the test timed out when running on a Android device. You can increase the timeout time by adding ?timeout=300 to the start page (where you add the settings), that will let you run the test for 5 minutes, however I think there's a limit of 3 minutes or something on recording videos on android devices so we will not get any metrics calculated by video.

To summarize: lets setup a task where we setup the methodology how to test and then lets do it like that!

Whoops yeh you are right... I do have tests on the same device that back up these findings [1] however.
Agreed we need to test more than 9 times per URL.

It's worth noting that in the 3G fast test 0.677s first byte for with images vs 1.260s without. The cookie will force the page to not come from a cache (I forget exact details but we vary the cache on this cookie - it will also invoke some additional server side parsing before sending back to the user).

That said would it be accurate to rephrase this bug as impacting 2G connections only? It's clear on 3G we don't have such a considerable problem.
Definitely up for running some kind of test and coming up with a methodology. How do we go about that?

[1] https://phabricator.wikimedia.org/T111198#1622778

In T110615#1589736, @brion wrote:

Do we have an upstream bug report with the browser vendor? Images are non-blocking resources and shouldn't be competing with blocking JS/CSS...

This is not a browser bug. The fact that images are "non-blocking" doesn't change the fact that they compete for scarce bandwidth, thus stretching out the time it takes to download the critical blocking assets (HTML, CSS, JS).

IF the images were loaded from the same origin, then the browser could do a better job here... We'd use the same SPDY/H2 connection and indicate that images are a lower priority than CSS/JS/HTML, allowing the server to sequence bytes accordingly. However, because the images are served off different origins, we end up with two competing TCP connections, both of which compete for same scarce bandwidth without ability to prioritize between them (broken prioritization and congestion control.. yay sharding).

FWIW, deferring image loads would definitely help with improving the initial loading sequence. Beyond that, I'd love to see more discussion on whether they should be loaded at all (see T100258 for related discussion for on-demand strategies).

Pretty good Facebook engineering post on how they managed to squeeze blurry previews into 200 bytes: https://code.facebook.com/posts/991252547593574

With ArrayBuffer support in ServiceWorker, this could be an interesting option for progressive loading:

  1. In parallel with the HTML load, use ServiceWorker to fetch a set (bundle?) of really compact previews, and stash them in the cache.
  2. Intercept the first round of regular image requests from the browser, and serve those low-resolution thumbs (using ArrayBuffer to build up the binary response from the JPEG chunks).
  3. In the DOM, progressively load higher-quality versions of those images in the page as they come into view / if the network bandwidth seems sufficient.

Pinterest has an even lower-bandwidth idea of setting the image background to the dominant image color: http://blog.embed.ly/post/51071740487/pinterest-colored-background-placeholders

It's pretty clear we should do this - more than anything in my opinion right now - we just need to agree on how but we should do something.
As next step @ori has suggested T119797 and I've suggested a further iteration: T120875

Jdlrobson claimed this task.