Page MenuHomePhabricator

[Epic] ResourceLoader with no version registry (investigation)
Closed, DeclinedPublic

Description

@ori brought up the idea of implementing an alternative client pipeline that would essentially work without a the version manifest in the startup module. Instead, it would make requests to load.php without version parameters. The 304 Not Modified roundtrip (then with must-revalidate instead of Expires) would substitute the startup manifest.

This model is specifically with modern browsers in mind that support SPDY/HTTP2. SPDY brings down most overhead from additional requests into a shared connection.

For older browsers we'd most likely want to retain the old model.

We haven't yet asserted that this alternative pipeline would actually be a net benefit. That's all part of this task to be researched.

If it does work, there are a number of blockers to making this happen as our current infrastructure is build on the assumption that load.php urls must not access backends frequently (since it's relatively slow to run the minifier and preprocessors).

Current pipeline

See also ResourceLoader documentation on mediawiki.org

There is essentially one <script src> in the HTML response. It points to module=startup. Cached for a short duration (~ 5 minutes). It contains a map of all module names, their dependencies and version hashes of all modules.

Individual pages then have one or two <script>mw.loader([ ...])</script> calls (in the head and/or body-bottom respectively). The loader client expands the list of modules to include and de-duplicate dependencies (due to support for lazy-loading, it also ensures filter out modules already loading or loaded). For each module it then looks for a {name}@{version} cache entry (localStorage, IndexDB, ServiceWorker Cache, ..).

Modules not in the itemised cached are combined in a query string resulting in a load.php?modules=......&version=.. request. Where version is a hash of each modules' hash combined.

load.php requests, as being permanently cacheable due to their unique url, have long Expire and Cache-Control headers. Returning visitors will hit local cache (no HTTP 304 roundtrip). And our frontend Varnish proxies will cache the response, thus only generating it from a backend once.

Important features:

  • Page HTML is highly cacheable (references only the startup module, and list of canonical module names associated with that page). Essential is that it does not hardcode module versions and/or dependencies.
  • Module content and dependencies can be deployed globally within 5 minutes to all users.

Weakspot:

  • The startup module request has to recompute contents of all modules every 5 minutes. And it has to do so in a single user-facing http request. If nothing changed, it will emit the same hash (HTTP 304) so that clients don't redownload it. However it still takes up precious time every 5 minutes. Task T98087 would kill the startup module and demands that we improve content-version logic (e.g. omit some of the build steps, hash content in a more raw form) and/or improve performance of said build steps (especially Lessc).

Proposed pipeline

Omit the versions from the startup module. Then adapt the client to no longer include version query parameters in the load requests, and no longer make batch requests for multiple modules. Instead each module gets its own request. Over the SPDY protocol, the overhead of separate requests is negligible. This has the advantage of making modules more cacheable and less susceptible to cache fragmentation (especially server-side, as localStorage already prevents most fragmentation client-side). In addition it allows for better concurrency.

Benefits:

  • Greatly improve response time for startup module. It would merely output the module registry.
  • Reduce cache misses. When a module change is deployed, clients would then only re-download the changed module. Not the whole startup manifest -and- the changed module.
  • Distribute version computation over multiple concurrent requests.

Potential pitfalls:

  • The module load requests can no longer allow local caching. Hopefully this will be compensated by these requests being mostly cheap HTTP 304 roundtrips. These roundtrips together essentially substitute what used to be the version manifest. Except that the version manifest allowed public-caching and max-age=300 (for 5 minutes) and ensures atomicity (all modules are from the same snapshot). In the new pipeline, we'd have to use max-age=0 and must-revalidate.
  • Any sort of caching (including ServiceWorker, localStorage and caching proxies) would endanger integrity and may cause incompatible versions of dependencies to arrive on the client.
  • Module build time (e.g. css preprocessing, minification) will become significantly more exposed to user requests.

Event Timeline

Krinkle raised the priority of this task from to Needs Triage.
Krinkle updated the task description. (Show Details)
Krinkle added subscribers: Krinkle, ori.
Krinkle triaged this task as Medium priority.Jul 8 2015, 2:02 PM
Krinkle reopened this task as Open.
Krinkle claimed this task.
Krinkle claimed this task.

Could be revisited in the future, but doesn't seem particularly realistic or obviously desirable at the moment. It's something to think about and maybe consider.