Page MenuHomePhabricator

Refactor startup logic to bundle base modules and page modules into one unified request
Closed, DeclinedPublic

Description

With current implementation, a lightweight JavaScript startup module is first loaded (startup.js) which tests browser user agent and then triggers a download of MediaWiki and jQuery.
When these have finished downloading, a call to mw.loader.load is triggered that downloads another JavaScript URL. As a result we make 3 JavaScript requests.

I did wonder, what is blocking us from merging the last 2, e.g. making the startup module do something like this:

script = document.createElement( 'script' );
script.src = document.documentElement.getAttribute( 'data-script' );

instead of

script = document.createElement( 'script' );
script.src = $VARS.baseModulesUri;

I am sure there are cache considerations and this will fragment the cache for some pages, but our time to first byte is costly and it looks like this would save our users up to 2s for each JavaScript request we avoid.

On mobile on a cold cache there are currently 5 script requests (startup module, mediawiki/jquery, top loaded scripts, bottom loaded scripts and a CentralAuth script). What's stopping us at the very least merging top loaded scripts with mediawiki/jquery ?

Event Timeline

Krinkle renamed this task from Bundle mediawiki/startup script with initial load scripts into one unified request to Refactor startup logic to bundle base modules and page modules into one unified request.Apr 18 2016, 12:29 AM

Rephrased summary since this is combining the base modules request (jquery+mediawiki) with that of the page modules request (mw.loader.load queue). If I understood correctly, this is not about combining the startup module with something else.

Current:

  1. startup module
  2. base modules (jquery+mediawiki)
  3. mw.loader.load queue for current page.

To merge any of these requests, would require a pretty major redesign (logically incompatible with core design principles of ResourceLoader). I'm lowering priority for now since this isn't something I think we can achieve in the current or next quarter. But we can gather some data meanwhile and perhaps take this on at a later time. Since our JavaScript pipeline is asynchronous, none of this is blocking the critical path anymore. We currently have other more impactful changes ahead that don't involve breaking changes.

@Jdlrobson wrote:

I did wonder, what is blocking us from merging the last 2, e.g. making the startup module do something like this:

script = document.createElement( 'script' );
script.src = document.documentElement.getAttribute( 'data-script' );

instead of

script = document.createElement( 'script' );
script.src = $VARS.baseModulesUri;

I am sure there are cache considerations and this will fragment the cache for some pages, but our time to first byte is costly and it looks like this would save our users up to 2s for each JavaScript request we avoid.

Yes, I agree it would be nice to combine these requests. However, the current situation has little to do with cache fragmentation.

I assume that the proposed data-script attribute would contain a load.php url that requests both the base modules and page-associated modules.

The startup module's main purpose is to decide whether the current browser is capable of Grade A runtime (aka "cutting the mustard") and to transmit the module manifest with site metadata.

That manifest includes two pieces of information needed for mw.loader to be able to construct a load.php url: module dependencies and module version hashes.

We cannot compute these server-side as part of the HTML response as that would hardcode the flattened module list (their dependencies) into the page.

For example, if you then deploy a change to a module Foo that adds var x = mw.storage.get( 'x' ); and dependency mw.storage, all pages will fatal with mw.storage undefined because those data-script urls still contain "Foo" without "mw.storage".

Currently this is enabled by computing the script url client-side with the latest metadata from the startup module. Since the startup module provides a consistent snapshot of server-side state it cuts out any and all race conditions with regards to dependency changes. Even if the startup module is 5 minutes old, it presents a consistent picture of that state, so no fatals basically.

Aside from the aforementioned incompatibility with dynamic dependencies and version changes - Fragmentation does come into play later. We wouldn't want to load that script url as-is. Ideally we'd take out modules we already have in localStorage. Otherwise there'd be a major regression in cache fragmentation and bandwidth consumption on mobile. (E.g. changing 1 module means everybody downloads everything again). That logic lives in the mediawiki library, which is one of the modules being loaded.


On second thought, perhaps a more feasible option would be to combine the first two requests (startup and base modules). The only complication there is that we'd have to figure out a way to make old browsers not throw up on ES5 syntax. Since it'd be downloading our base modules in the same script response as the compatibility check.


On third thought, perhaps a more feasible option would be to inline the compatibility check into the HTML. And fire off the manifest + base modules request from there.

Krinkle moved this task from Inbox to Backlog on the MediaWiki-ResourceLoader board.
Jdlrobson raised the priority of this task from Low to Medium.Apr 27 2017, 6:53 PM

It feels to me like we are punishing grade A browsers by introducing this additional JS query for grade C browsers.
I'm hoping we can get round to talking about this during the quarter. Inlining the compatibility check into HTML seems like a good starting point.

It feels to me like we are punishing grade A browsers by introducing this additional JS query for grade C browsers.
I'm hoping we can get round to talking about this during the quarter. Inlining the compatibility check into HTML seems like a good starting point.

Sorry, but I don't think we should. I'm always open to new ideas, and also don't mind dropping other goals in favour of prioritising something new to work on together. Especially if it promises to improve performance. However, I don't believe this would work.

I assure you, page load performance is the (if not, one of the) most important continuous goals for our team. And in particular to me personally. Every quarter I aim to resolve at least one major (and a handful of minor) sub tasks in that category. See T127328.

Above I've tried to explain that combing these requests is logically incompatible with our design requirements. The startup module is more than just a compatibility check. The only advantage from inlining the compatibility check would be applying the client-js class earlier (to avoid FOUC).

That would make a nice improvement, except we already did that, in 2015, when d7905627fdc3b2 first introduced client-js. We set it without delay (and without compatibility check) from within the page HTML, and instead sometimes "punish" Grade C browsers with a brief flash from client-js back to client-nojs.

The rest of the startup module is:

  • Module manifest: Too large. Embedding it would be a regression since that would make it synchronously blocking page render, instead of asynchronous as it is now. It would also increase bandwidth by re-downloading it every page load. It would break consistency and caching in numerous ways since it would now be cached as part of the page HTML, which goes against our design requirements and breaks various guarantees.
  • Base module url and request: This too must be versioned and must not be in the page HTML. In addition, the request must not happen in Grade C browsers as it would cause an unrecoverable exception.

We should focus our efforts on improving first render time (using HTML and CSS as much as possible). For the most part, whether JavaScript loads fast, slow, or not at all, shouldn't make much difference for the end-user. This is further emphasised and enforced by all JS loading asynchronously. Meaning, any JS that does try to change things above the fold, would always cause a FOUC, which we consider a bug.

However, while JS loads asynchronously and (for the most part) makes no reflows, we do find over and over again that shrinking or removing JS universally causes improvements to metrics like "First paint". This is surprising at first but makes sense. Reducing JS pay load, or deferring it slightly doesn't influence the time between load start and load end. It merely re-arranges the order of events so that the browser can first spend all its CPU and Network resources on parsing and rendering the page, layout, and images.

Just in the last 2 weeks, for example, we did get rid of an entire HTTP request (0ac6076b4c0848) and shrunk the module queue (T159911), which improved things measurably (T161490). This was a surprise since the JS removals should not have related to first paint, but as mentioned, it frees up browser resources for page rendering and improves various other heuristics that browsers use to avoid rendering areas that will likely change again.

There are various things we can do to speed-up the JS pipeline unrelated to this task:

  • medium-term: Consider using "preload" headers on the startup request to make the browser start downloading the base module request before it finishes parsing/execution of the startup module.
  • long-term: Server-side page composition (T106099) or ESI, would allow us to embed the (versioned) base module request uri inside the page HTML in a way that still invalidates consistently at once for all page views.
Jdlrobson lowered the priority of this task from Medium to Low.May 2 2017, 5:38 PM

Using preload would be pretty darn good. Is there a task tracking that?