Page MenuHomePhabricator

RFC: WebAssembly and compiled JS code best practices
Open, Needs TriagePublic

Description

The latest version of all major browsers support C/C++/etc code compiled to WebAssembly ("wasm") format, in addition to the "asm.js" style of compilation targeting a subset of JavaScript directly.

[Work in progress, please feel free to suggest updates]

The first library that MediaWiki uses to support WebAssembly is ogv.js, a codec & playback library which we use for playing Ogg and WebM files on Safari, IE, and Edge browsers. ogv.js is packaged with the TimedMediaHandler extension, and we've used its existing asm.js mode for about two years now.

I'd like to enable the WebAssembly mode in production for faster compilation/load times, but want to double-check that we've established some best practices for WebAssembly usage first.

Tech overview

  • asm.js is a subset of JavaScript which is used for outputting compiled code in a way that the browser can recompile efficiently.
  • WebAssembly is a compact binary format for a device-independent bytecode, roughly equivalent to asm.js in capabilities and usage patterns but with a smaller footprint and quicker parsing/compilation times.
  • emscripten is a popular compiler targeting both these platforms.

A library using WebAssembly or asm.js compiled code will generally have three parts:

  • client JS code or wrapper library
  • compiler-generated JS "glue code"
  • asm.js or .wasm compiled code

The asm.js or .wasm compiled code is built from C/C++ or other source files with clang and the emscripten compiler; this process may be as complex as any other native code build or may be a few simple command invocations.

Security

The compiled code has access only to a chunk of linear memory and whatever JavaScript functions are passed into it by the glue code, which may be very few or very many -- so security surface varies. Compiled code may be a simple "headless" library, or may manage a WebGL context for visualization, input events, etc.

Note that the Spectre speculative-execution data exposure timing vulnerability could be exploited with both wasm and js code given sufficiently precise timers. This is being actively mitigated by browser makers with a combination of timing source fuzzing and reworking their JIT compilers to produce safer code.

Best practice: when checking a wasm/asm.js library for security, check the 'contract' in the glue code to determine what needs to be examined more closely
Open questions: safety for library releases etc?

Compatibility

asm.js code output runs in all of our level-A compatibility browsers, requiring only Typed Arrays -- even IE 11! All major browser engines in their latest versions support WebAssembly as well, which can be detected at runtime. (Code built with recent emscripten versions requires the LEGACY_VM_SUPPORT option for IE 11 compatibility.)

Modules that package WebAssembly code should generally include an asm.js build as well, and select the one to load at runtime -- asm.js takes longer to load and compile.

Best practice: include either asm.js only, or both wasm & asm.js
Open questions: none?

Source vs binary check-ins

Compiling code to WebAssembly or asm.js is like compiling native C/C++/rust/whatever code, not like minifying JavaScript. This means it's not suitable for runtime transformations via ResourceLoader.

Best practice: C/C++ source lives in separate library, .wasm+.js "binaries" checked in to MW core or ext like any library. Publishing via npm and having a local script to pull the updates from node_modules into source is probably best for now.
Open questions: Is there a better way to handle JS+assets from package manager sources? (cf T107561)

Loading asm.js and wasm code

asm.js code may have trouble running through ResourceLoader's minification process -- TimedMediaHandler currently loads ogv.js's front-end JavaScript through ResourceLoader and loads the codec payloads directly as asset files.

.wasm code blobs are loaded at runtime as assets, which may require passing wgExtensionAssetPath etc into the initializing JavaScript code.

Best practice: treat large asm.js blobs and .wasm blobs as raw assets outside RL; pass the proper URLs into initializer code
Open question: can/should we improve this?

Credit and licensing

Licenses of compiled code may require offer of source (GPLv2) or copyright notices (BSD). What's best practice to include these in Special:Version?

Best practice: treat like other JS libraries
Open questions: automation?

Action items

  • talk through any further issues
  • look at the npm integration issue?
  • find appropriate part of contributor guidelines to update
  • update it

Event Timeline

brion created this task.Nov 27 2017, 11:03 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptNov 27 2017, 11:03 PM
brion updated the task description. (Show Details)Nov 27 2017, 11:19 PM
brion renamed this task from RFC: WebAssembly compiled JS code best practices to RFC: WebAssembly and compiled JS code best practices.Nov 27 2017, 11:26 PM

A few more thoughts I had:

  • Debugging - Do we need special tools to debug wasm code? I'm guessing ?debug=true doesn't work.
  • Binaries in Git - it really sucks. I think we should try to avoid this if possible. And it's also really problematic for re-distributing in Debian (TimedMediaHandler currently isn't, but I expect that eventually this will make its way into the MediaWiki tarball somewhere).
  • Credits - we've kept pushing the JS libraries on Special:Version down the road until we had a proper package manager but I think we should just bite the bullet and figure out something for that sooner.
  • CI - if we start using wasm stuff in extensions (not pulling in a library) we need CI support for this.

Debugging

  • Browser debugging tools do (or should soon!) work on wasm, they're just .... uglier :D

Binaries in git

  • agree it's painful; should we composerize things? Might be easier to work with. I have ogv.js in npm, for instance, and have a more regular 'import JS library from npm' mechanism would be nice?

Credits

  • Can we automate any kind of import from npm data? Or just have a separate place to stick it in the extension.json ..

CI

  • JavaScript testing frameworks can be used to black-box test the JS interface in front of wasm code. Consider exercising both current (wasm-capable) browsers and old-version-but-still-stupported in testing matrix, though we should have that kind of thing already for ESR releases of Firefox etc
  • Linting tools may have to be told to ignore asm.js-style JS code that's checked in
Anomie added a subscriber: Anomie.Nov 28 2017, 1:49 AM

Best practice: treat large asm.js blobs and .wasm blobs as raw assets outside RL; pass the proper URLs into initializer code

I note loading of raw assets has come up as a potential issue in T180394: MediaWiki entry points should not be in the base repo directory. Ideally extension directories wouldn't be included in the webroot.

If you try to import them via composer, there's already T180237: Have composer create a .htaccess file in vendor director that'll get in the way of that plan.

Binaries in git

  • agree it's painful;

Details? I don't recall running into any issues with the Lua binaries included in Scribunto.

Binaries in git

  • agree it's painful;

Details? I don't recall running into any issues with the Lua binaries included in Scribunto.

The Lua binaries are pretty small and pretty rarely updated. https://opensource.com/life/16/8/how-manage-binary-blobs-git-part-7 describes some of the things that people have done to workaround this, and there's planned work to support git-lfs in Gerrit.

daniel moved this task from Inbox to Backlog on the TechCom-RFC board.Dec 6 2017, 9:23 PM
daniel moved this task from Backlog to TechCom-Approved on the TechCom-RFC board.Jan 5 2018, 2:05 PM
daniel edited projects, added TechCom-RFC (TechCom-Approved); removed TechCom-RFC.
daniel moved this task from Inbox to Backlog on the TechCom-RFC board.
daniel added a subscriber: daniel.

sorry, moved this to the wrong column accidentally.

brion added a comment.Jan 9 2018, 8:43 PM

Just pinging myself to write up more notes here soon to keep it active.

brion updated the task description. (Show Details)Jan 29 2018, 7:55 PM