Page MenuHomePhabricator

[RFC] Performance: Inline above-fold CSS in HTML response to unblock render and reduce time to first paint
Open, NormalPublic

Description

Most web performance tools and resources recommend prioritizing above-fold CSS delivery to unblock streaming rendering as early as possible. Linked CSS suffers from a late start (only after the HTML head is loaded), and contention with the parallel HTML load. Inlined CSS avoids this, immediately unblocking the browser to progressively render HTML as it arrives.

Our own results from T113066#1893866 corroborate the huge influence on first paint, especially on slow connections:

A surprising (but tangential) result is that Chrome already seems to defer loading of below-fold images, at least if CSS is available to determine above / below fold status. Time to a rendered & interactive first screen is almost unaffected by image loading if CSS is inlined or generally loaded before above-fold images start loading.

On a Galaxy Note 3 using a wifi connection, Chrome renders the first screen of Obama with images and inline styles after about a second. The full page load takes about six seconds. CPU does not seem to be a bottleneck for first paint on this ~2 year old device. Scrolling is smooth all the way through the rendering phase.

Optimizations: Only inline above-fold CSS

There is a good variety of tools available that automatically separate above-fold from below-fold styles. One of them is Google's PageSpeed module.

A likely issue with these dynamic solutions is going to be performance. It might make sense to use them as a starting point for a static split of above-fold vs. below-fold CSS instead.

Alternative approaches for early CSS delivery

HTTP/2 push

There are some early cache-aware HTTP/2 push implementations, but implementing this in our current infrastructure does not seem to be very straightforward. Nginx does not support push directly yet, which means that we'd need to use another HTTP2 frontend like nghttp2. It seems likely that nginx will gain support for HTTP2 push in the medium term as well.

ServiceWorker CSS caching / injection

Repeat requests can be sped up by persistently caching & quickly delivering CSS from a ServiceWorker. However, this won't address the large percentage of occasional visits or clients without ServiceWorker support, so can only be seen as a complementary optimization to inlining or HTTP push.

Proposal

Given the fairly low complexity of a minimal implementation & the very significant performance gains, I think we should look closely at applying inlined above-fold styles across the board.

For a production deploy, we should investigate how much size we can save with a static above-fold / below-fold RL module split. While even simple inlining is a big gain on 2g, there is a chance that the currently ~16kb extra compressed response size of full inlined RL styles would slightly reduce performance on repeat requests, where the RL response would normally be cached in the client.

Possible issue: Cache invalidation for CSS

A simple implementation without ESI or similar would couple the cache life time of HTML and CSS. In many cases this is useful, as there is naturally a strong coupling between the two, but there might be dynamic features or general changes that we would prefer to apply more consistently and quickly.

For example, a font change can currently be applied fairly quickly in a way that applies to both old & new cached HTML. In a simple implementation without ESI, the font change would instead take up to a month to apply to all pages, as the lifetime of the CSS is coupled to the HTML.

Event Timeline

GWicke raised the priority of this task from to Normal.
GWicke updated the task description. (Show Details)
GWicke added subscribers: GWicke, ori, Jdlrobson and 3 others.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJan 27 2016, 8:30 PM
GWicke updated the task description. (Show Details)Jan 27 2016, 8:37 PM
GWicke set Security to None.
GWicke updated the task description. (Show Details)Jan 27 2016, 8:43 PM
Peter added a subscriber: Peter.Jan 27 2016, 8:59 PM

Chatted to @ori and @GWicke about this today.

Main issues with this seem to be around caching (that aside doing this would be a no brainer):

Editor expectations that edits to MediaWiki:Common.css are instantly visible

In the current setup with no edge caching for authenticated users, editors would actually see current styles by default. Anons would not, but many tweaks in common.css tend to be specific to features only authenticated users see / use, so this might not be prohibitive. Common.css vandalism could be an issue, but this problem does not seem to be very different from template vandalism. Both common.css and popular templates are protected, and both take a long time to update.

The biggest gains from inline styles are on clients with poor network connectivity. At the same time, I think expectations around instant common.css updates tend to be lower for the mobile site. It might be a good strategy to pioneer inline styles for mobile first, and then evaluate if the delayed style updates are an issue in practice before considering expanding to the desktop site.

GWicke updated the task description. (Show Details)Jan 28 2016, 6:42 PM
GWicke updated the task description. (Show Details)
GWicke updated the task description. (Show Details)Jan 29 2016, 4:09 PM

I heard a rumour that @BBlack is looking at the typical maximum shelf life of a page in cache (currently assumed to be 30 days) to see if it is any lower. Am I correct? This will be interesting to know as I'm personally fine with this, given Minerva's design is pretty stable now but I'd suggest we discuss how this impacts any noticeable UI changes we might make in future.

I'm keen for us to explore this near the end of the quarter. CSS is 7.36kb on mobile and currently excludes site CSS (MediaWiki:Mobile.css is loaded via JS at the moment) which means only CSS inside the Minerva skin would be cached up to the limit. I'm happy for the Minerva skin to be a guinea pig.

@Jdlrobson - The typical shelf life of a page is already lower than 30 days. We cap the maximum life at exactly 30 days in our caches, and we're looking at dropping that cap downwards ( T124954 ). Note also that as of a few hours ago, we've temporarily disabled SPDY to test perf impact on both slow and fast SPDY-capable clients, which might impact your measurements (or be useful to re-test against): T125979

Change 270434 had a related patch set uploaded (by Jdlrobson):
PoC: Inline top loaded CSS when MinervaInlineCSS is true

https://gerrit.wikimedia.org/r/270434

Jdlrobson moved this task from Backlog to Parking lot on the MobileFrontend board.Feb 18 2016, 6:29 PM

Change 270434 abandoned by Jdlrobson:
PoC: Inline top loaded CSS when MinervaInlineCSS is true

https://gerrit.wikimedia.org/r/270434

brion added a subscriber: brion.Apr 25 2016, 7:12 AM

One option for resolving the conflict between long-term caching and the need to update inlined CSS in a timely manner is to use ServiceWorkers to compose the page. This composition is fairly cheap (hundreds to thousands of dynamic compositions per second & node when run server-side via node-serviceworker-proxy), which makes it feasible to drop Varnish TTLs significantly further without a significant increase in cost.

I prototyped this in https://github.com/gwicke/streaming-serviceworker-playground, and benchmarked this both on the client side & server side via https://swproxy.wmflabs.org/wiki/Foobar and https://swproxy-mobile.wmflabs.org/wiki/Foobar.

Results:

  • Client side with a ServiceWorker installed, inlining CSS does not make a major difference. The full CSS is cached in the ServiceWorker anyway, and will be fully fetched and parsed significantly before the actual content starts streaming in. In the case of a content cache hit, inline CSS can still improve performance slightly, but the differences are in the single-digit ms, and very close to the noise level.
  • Requests without a client-side ServiceWorker see a significant (~25-30%) improvement in first paint times. This is true even when comparing a labs proxy against fully cached Varnish responses. Streaming responses in the ServiceWorker help to keep the time to first byte down & at a similar level as the Varnish cache response.
Jdlrobson moved this task from Backlog to Later on the Readers-Web-Backlog (Tracking) board.