Page MenuHomePhabricator

RFC: Page composition using service workers and server-side JS fall-back
Closed, DeclinedPublic

Assigned To
None
Authored By
GWicke
Jul 16 2015, 11:27 PM
Referenced Files
F4461576: pasted_file
Sep 12 2016, 11:51 PM
Tokens
"Like" token, awarded by Akuckartz."Love" token, awarded by Kaartic."Mountain of Wealth" token, awarded by RandomDSdevel."Like" token, awarded by Jdlrobson.

Description

Preamble

We have been looking for ways to expand our cacheable content beyond anonymous requests for a long time. Once a user is logged in, a number of personalizations primarily in the chrome around the content (user name, tabs, links) make it hard to reuse a cached copy of an entire page. Initial trials to perform those personalizations with ESI were done as early as 2004, but even with Varnish testing we have seen performance and stability issues. Server-side composition technologies like ESI or SSI also introduce a second code path, which makes it harder to test and develop front-end code without intimate knowledge of a complex part of our stack.

An alternative is to use JavaScript for the composition. This opens up the possibility of running the same JS code

  • on the client, in service workers (essentially caching HTTP proxies running in the browser), or
  • on the server, behind edge caches, in a JS runtime like Node.js with an implementation of the ServiceWorkers API, processing both cache misses and authenticated views.

By using JavaScript, we get to use familiar and mature HTML templating systems that have support for pre-compilation. This simplifies the development and testing process. While Varnish performance drops significantly with each ESI include (we measured 50% with five includes), pre-compiled JS templates can potentially perform fairly fine-grained customizations with moderate overhead.

In browsers that support it (like current Chrome, about 40% of the market), we can preload templates and styles for specific end points and speed up performance by fetching the raw content only. By working as a proxy and producing an HTML string, we also avoid changes to the regular page JavaScript. In contrast to single-page applications, we don't incur routing complexity and heavy first-load penalties.

An interesting possibility is to prototype this in a Service Worker targeting regular page views (/wiki/{title}) only, while letting all other requests fall through to the regular request flow.

Proposal

This task is about implementing a minimal service that essentially composes an HTTP response based on multiple other resources that are themselves more cacheable and less variable. Initial design requirements:

  • High-throughput. Suitable for handling traffic at the edge, essentially doing only HTTP and string manipulation.
  • Request routing.
  • HTML Templating. Fetch, precompile and cache templates - presumably Mustache.
  • Streamable. Must implement template handling so flushing starts early and continues progressively.
  • Export as ServiceWorker. Add an endpoint that exports a JavaScript program compatible with a ServiceWorker that contains all the utilities (elematch, mustache etc.), Router, RequestHandler, and the current install's router configuration and custom RequestHandlers.

Related:

The Node.js service itself should probably use wikimedia/service-runner and not be specific to MediaWiki in anyway. The service would only know which domain names it serves, and from where it can fetch the sw.js executable.

Dependencies

Making Service Workers work for wiki page views, is impossible without first resolving a substantial amount of technical debt.

I suggest the initial implementation is used for a less complicated use case. For example, the Wikipedia.org portal, which can justify the Page Composition Service to improve their localisation workflow, which is currently not very performant due to client-side XHR, causing FOUCs.

Related work by @GWicke and @Krinkle:

  • node-serviceworker-server: Node library that runs an HTTP service. It can be configured to map a domain and request scope to a service worker url. The service will use the node-serviceworker library to turn the service worker script into something that we can instantiate and make a request to on the server-side.
  • node-serviceworker: Node library that provides a Node sandbox with several browser APIs available in its scope (such as Fetch, ServiceWorker Cache, and more).
  • sw-wikimedia-helpers: Collection of utilities we expect most of our ServiceWorker clients to need. Such as request routing, view abstraction, and a command-line script to generate a compact sw.js file. This utility library will likely make use of:
    • browserify
    • mixmaster: Produce a readable stream from an array of string literals, functions, promises, and other streams. With the option to pass through one or more transforms. This allows progressively streaming to the client with the ability to dynamically substitute portions, and to precompile any templates.
    • elematch: Efficient matching of elements in a stream of HTML. To be used with Mixmaster. This would allow to progressively stream to the client with the ability to dynamically substitute portions.
    • musti: Streamable Mustache renderer. Uses Mixmaster.

See also

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
RobLa-WMF mentioned this in Unknown Object (Event).May 4 2016, 7:33 PM

There is now a basic node-serviceworker-proxy service running basic serviceworkers like this one via the node-serviceworker package. Registration is dynamic, and can be driven by an API response describing the ServiceWorker scope / URL mappings per domain. The ServiceWorker registrations are periodically refreshed with a background tasks.

Initial throughput numbers are quite encouraging. On a laptop, I am getting 2.6k req/s for a small wiki page. Larger pages are still around 2k req/s. This is with the content page & template cached, to gauge the page composition itself.

The next step will be to hook up more advanced page composition structures @Krinkle is working on, and to expand API coverage as needed.

A prototype proxy running this serviceworker is now set up in labs. The proxy also automatically registers the same serviceworker code on clients retrieving server-composed HTML content.

Limitations:

  • All resources are proxied through this service, to avoid CSP issues. In production, Varnish would handle non-HTML resources, and only forward HTML requests from clients without a ServiceWorker to the ServiceWorker proxy.
  • In the demo, only enwiki is supported via a static host header override. In a real deploy, host headers would work as-is.
  • While the proxy fully supports response streaming, the used demo ServiceWorker is not streaming-enabled yet. This means that performance on slow connections is not yet as good as it could be. We have created https://github.com/gwicke/mixmaster and https://github.com/wikimedia/elematch in preparation for streaming composition, and plan to integrate this in a ServiceWorker soon.

Next steps:

Updates:

A fully streaming demo ServiceWorker is now implemented at https://github.com/gwicke/streaming-serviceworker-playground/blob/master/lib/sw.js. This uses https://github.com/wikimedia/web-stream-util (formerly mixmaster) and https://github.com/wikimedia/web-html-stream (formerly elematch), which have both seen major refactors to support streaming. ResourceLoader requests are cached aggressively & can be refreshed in the background. On slow connections, this significantly improves first render time by unblocking the render as soon as the first chunk of body HTML comes in.

With a Chrome canary build (55+), you can try this at https://swproxy.wmflabs.org/wiki/Foobar. The code works on Chrome 53 (current stable) and 54 as well, but the header-based registration mechanism we currently use is only supported in 55. This version is scheduled to graduate to stable in December, so it might not be worth offering the (messier) JS registration.

This RFC has probably been made obsolete by https://www.mediawiki.org/wiki/Reading/Web/Projects/NewMobileWebsite. ServiceWorkers are an interesting technology, but there does not seem to be any concrete plan to follow the plan proposed in this RFC.

@daniel since this was made obsolete should it be declined as a RFC then?

@kchapman I think that may've been in response to an outdated note in the Google Doc. I've revised the task significantly in January earlier this year, and positioned it to be separated from T111588.

This RFC as it stands is imho now actionable and not obsolete. The MVP of this could be a lightweight service that runs in edge DCs that is capable of composing responses for any of our application domains. Including, as starting point perhaps, the www portals. That would allow us to keep this stack modular and without any knowledge of MediaWiki. (Our www-portals, such as www.wikipedia.org, actually do not run on MediaWiki, they run plainly as static files from Apache).

The MediaWiki-specific overhaul related to this, has been moved to T111588 and T140664. The outcome of that would be a more modular MediaWiki PHP codebase in which skins are capable of rendering pages quickly for logged-in users.

After that, it's up to the outcome of this task (T106099) to consider whether it's worth re-implementing that then-lightweight Skin system from PHP into something else (e.g. Node.js), and as part of that would be considering whether or not to use the ServiceWorker model server-side, or whether to do it without that.

Pchelolo subscribed.

Tons of data here, but I don't see this being woked on in the nearest future. Moving to the icebox for CPT.

Closing old RFC that is not yet on to our 2020 process and does not appear to have an active owner. Feel free to re-open with our template or file a new one when that changes.