Page MenuHomePhabricator

Fit above-the-fold layout and first paragraph HTML in 14 KB
Closed, ResolvedPublic

Description

Objective

For the browser to be able to have enough HTML to render everything above-the-fold with only a single network roundtrip.

Specifically, by making the following less than 14 KB in total: HTTP headers, HTML <head>, basic skin layout (as visible above the fold), and first bit of the article content.

Background

We don't control the speed (roundtrip latency) with which users can connect to our servers (e.g. via cellular radios and cable), and we also don't control their bandwidth (e.g. how many KB/s). However, it is important to remember that bandwidth is not speed. Transferring 100KB of data with 10 ms latency takes essentially the same amount of time on a 400 KB/s connection as on a 100 MB/s connection. The bandwidth controls how large individual packets can be (at most). It does not control how fast they travel.

The size of each chunk of data is automatically determined by TCP Congestion Control and related algorithms. However, for historical reasons, the default for most browsers and operating systems is to send/accept upto 10 packets of 1432 bytes at once. After that initial burst, the algorithms take over and grow the chunk size with each roundtrip until it finds that it cannot send more. This fluctuates and bounces over time. But, this is why 14 KB is a common target because that's roughly the amount of data all users receive at first, regardless of how high or low their bandwidth and latencies are (2G, 4G, WiFi, etc.)

Further reading

Event Timeline

How to measure

Size of HTTP headers:

  • curl -i https://en.wikipedia.org/wiki/Stockholm > http-head
  • trim until line above <!DOCTYPE html>.
  • measure with wc -c.

Size of HTML until start of content:

  • curl -i https://en.wikipedia.org/wiki/Stockholm > html
  • keep only until after <div class="mw-parser-output">
  • measure with cat | gzip -7 -c | wc -c

Size of HTML until first paragraph:

  • curl -i https://en.wikipedia.org/wiki/Stockholm > html
  • keep only until before <p><b>Stockholm</b> …
  • measure with cat | gzip -7 -c | wc -c
Current state (mobile)
Size of HTTP headers (uncompressed)1,242 bytes
Size of HTML until start of content (gzipped)3,724 bytes
Total transfer size before content4.9 KB
Size of HTML until first paragraph (gzipped)3,871 bytes
Current state (desktop)
Size of HTTP headers (uncompressed)1,240 bytes
Size of HTML until start of content (gzipped)3,445 bytes
Total transfer size before content4.7 KB
Size of HTML until first paragraph (gzipped)7,027 bytes

Looks like we're already well-under 14 KB on both. However, for desktop (Vector) this currently does not include the skin layout., because in Vector that is currently at the end of the HTML payload. This was done many years ago under the rationale of search-engine optimisation (SEO). This is the reason that when navigating between long articles, the site header blinks in and out because the visual completion of above-the-foldheader is blocked by all downloading and parsing of HTML below-the-fold. We should fix this. Once done, this can be closed so not such a big task after all :)

Krinkle triaged this task as Medium priority.Aug 25 2019, 4:25 PM

Change 532247 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[mediawiki/core@master] resourceloader: Compile documentElement.className server-side

https://gerrit.wikimedia.org/r/532247

Change 532247 merged by jenkins-bot:
[mediawiki/core@master] resourceloader: Compile documentElement.className server-side

https://gerrit.wikimedia.org/r/532247

Change 579658 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[mediawiki/core@master] OutputPage: Only export wgUserNewMsgRevisionId if non-null

https://gerrit.wikimedia.org/r/579658

Change 579658 merged by jenkins-bot:
[mediawiki/core@master] OutputPage: Only export wgUserNewMsgRevisionId if non-null

https://gerrit.wikimedia.org/r/579658

However, for desktop (Vector) this currently does not include the skin layout., because in Vector that is currently at the end of the HTML payload.

For more investigation related to this and Vector in particular, please see: T240489#6343447

Jdlrobson subscribed.

Is resolving this blocked on deploying Vector to English Wikipedia, or is this resolved now these changes are live on fr.wikipedia.org ?

If not, what tangible things remain?

Possibly neither. Moving the sidebar is just one of the possible steps toward this task, and at least for me I haven't looked at its impact on performance or the 14KB threshold. If you have then, it might be resolved indeed.

Here's the HTML leading up to 1st paragraph for French Wikipedia

cat html | gzip -7 -c | wc -c yields 17829

Here's the HTML leading up to 1st paragraph for French Wikipedia

cat html | gzip -7 -c | wc -c yields 17829

Note that a large portion of those bytes (my estimation back in July was around 50% [1]) come from the interlanguage links in the sidebar which we will be moving out of the sidebar as part of the Desktop Improvements Project although they still might be above the fold depending on where we decide to place them.

[1] https://phabricator.wikimedia.org/T240489#6343447

Languages will still be above the fold after that change. T104660 talks about this issue.

@Krinkle this looks done to me... at least as done as it could be.

The value fluctuates, but on small pages the value is way under 15kb e.g. https://fr.wikipedia.org/wiki/Vincenzo_Monti_(dessinateur) I am currently measuring at 8766
On bigger pages like the example @nray gives, we're looking at more than 15kb T231168#6763465

One way to improve that is by pulling out languages from the HTML and providing a different fallback per T104660 and loading them via JavaScript for JS users. I'll leave that with you to decide next steps, but I don't see anything right now that the team can do, other than changing the current DOM structure which we settled on for accessibility and performance reasons.

Vector currently, https://en.wikipedia.org/wiki/Northern_Europe (108 language links)

# HTTP headers
curl -sI 'https://en.wikipedia.org/wiki/Northern_Europe' | wc -c
1395
curl -s 'https://en.wikipedia.org/wiki/Northern_Europe'' > enhtml
# From DOCTYPE until first paragraph words "<p>Northern Europe is a loosely defined"
cat enhtml | tr '\n' ' ' | sed 's/^\(.*<p><b>Northern Europe<\/b> is a loosely defined\).*/\1/' | gzip -9 - | wc -c
3876
HTTP headers1,395 bytes
HTML transfer size until first paragraph words (gzip)3,876 bytes
Total transfer size to first paragraph5.2 kB

Important caveat being that, while this is truly the only required HTML to rendering article content in its final layout position and with the correct styling etc (no more shifts), it does not include the sidebar which renders later.


New Vector, https://fr.wikipedia.org/wiki/Europe_du_Nord (108 language links)

# HTTP headers
curl -sI 'https://fr.wikipedia.org/wiki/Europe_du_Nord' | wc -c
1395 (identical composition and size)
curl -s 'https://fr.wikipedia.org/wiki/Europe_du_Nord' > frhtml

# From DOCTYPE until first paragraph words "<p>Dans un sens restreint ..</p>"
cat frhtml | tr '\n' ' ' | sed 's/^\(.*<p>Dans un sens restreint\).*/\1/' | gzip -9 - | wc -c
13065
HTTP headers1,395 bytes
HTML transfer size until first paragraph words (gzip)13,065 bytes
Total transfer size to first paragraph14.4 kB

While three times the size, this includes the sidebar and thus is expected to result in an earlier "last visual change" metric due to not having one final paint for the above-the-fold sidebar delayed after all the below-the-fold content.

Of course, in terms of actual perf apples-to-apples comparison this remains yet to be seen in synthetic tests, but for this task it was already resolved in prod, and it's good to now that it stays within that budget more or less for New Vector as well, although it does cut away all our margins.