Page MenuHomePhabricator

ResourceLoader: Implement support for Source Maps
Open, MediumPublic

Description

Source maps are a technique for mapping combined and minified JavaScript back to the original files. This could be very useful for debugging ResourceLoader in production mode. This can be necessary since behavior is different from debug mode (in ways other than minification).

This is supported in the Chrome, Firefox 50 and IE 11 debuggers and Closure compiler, and some other stacks have code to generate the maps.

The spec is at https://docs.google.com/document/d/1U1RGAehQwRypUTovF1KRlpiOFze0b-_2gc6fAH0KY0k/edit?hl=en_US&pli=1# and a good overview is at http://www.html5rocks.com/en/tutorials/developertools/sourcemaps/ .

The minified file points to the source map with a line like:

//# sourceMappingURL=/path/to/file.js.map

or a header like:

X-SourceMap: /path/to/file.js.map

If we use a dynamic URL, that should allow doing it in production. It would build the source maps on demand (just like the minification) for people that have them enabled (and are debugging), without slowing the site for anyone else.

https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/SourceMap

Related Objects

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 1:32 AM
bzimport set Reference to bz45514.

To clarify, I mean something like:

//@ sourceMappingURL=/sourcemap.php?modules=...

for the dynamic URL, with the response built and cached on demand.

Because of version numbers and caching, the link should probably include a hash instead of the module name (or both).

ResourceLoader already has this infrastructure in place. When building packages, we only rebuild if the hash of the module contents are different.

That hash is also output for debugging purposes with the cache key at the end of the package.

We can use that same hash to tie a packaged response to the sourcemap, which we'll need to generate while packaging, not on-demand from the sourcemap request.

Could we also use the hash for identifying versions in regular requests, rather than the version parameter? That could potentially solve the problem of out of date HTML getting new JS+CSS.

(In reply to Matthew Flaschen from comment #3)

Could we also use the hash for identifying versions in regular requests,
rather than the version parameter? That could potentially solve the problem
of out of date HTML getting new JS+CSS.

No.

Firstly, the version number we have now would be more than enough for that purpose. The reason cached HTML is getting newer resources is because we explicitly *don't* embed version numbers or hashes of any kind in the HTML.

That is not a bug but a feature (or bug fix) by design in the core of how ResourceLoader did what old $wgStyleVersion did wrong.

As for implementation, building on comment #2, the bottom of load.php javascript request would contain something like:

/* cache key: enwiki:resourceloader:3957c1d7aa */
//# sourceMappingURL=load.php?action=sourcemap&id=3957c1d7aa

Aside from incorporating a source map generator in PHP in the first place, doing this would be a pretty easy first step for minified -> unminifed mapping.

However it's slightly more elaborate to also maintain where the modules came from to their respective individual non-concatenated files because when we're in the minifier, it's for the entire request as a whole. We'll have to maintain that state somehow and accumulate that context as we go on.

Krinkle lowered the priority of this task from Medium to Low.Jan 5 2015, 3:00 PM
Krinkle updated the task description. (Show Details)
Krinkle set Security to None.

Some PHP libraries:

Given MediaWiki uses a specific minifier JavaScriptMinifier, it should be added there some tracking of the column/row of the source and generated files, but it shouldn’t be too difficult given the code is quite clear.

The next step is to compute the source map and mainly the base64-VLQ; for this, either a library can be used, either it can be created by hand, but the base64-VLQ is itself a bit complicated, and the positions of the segments are relative to the previous hence some caution must be taken.

The integration into the ResourceLoader shouldn’t be too complicated: each module is minified separately in ResourceLoader::makeModuleResponse(), and an additional flag could be added to compute the Source Map and to collect the origin file names.

A point to be verified is that many modules (=files) are grouped together, each in a mw.loader.implement section, I guess it should be used an "index map" (see spec page 5); I hope this is supported by browsers, it seems it is not widely used.

Beyond Source Map for JavaScript, it could be considered for CSS. For this it should be implement in CSSMin in the same way as JavaScriptMinifier. It can be considered also for LESS, and in this case two maps should be computed one after the other (LESS transpiler then CSSMin). For CSS there is also the difficulty that it can be embedded as a JavaScript string in mw.loader.implement (I have no idea if Source Map can/should be used in this case).

For the interface I thought about something like /load.php?debug=sourcemap&lang=fr&modules=ext.uls.common,init,interface&skin=vector&version=05ghpbi to keep the same stateless infrastructure. The downside is that the comment //# in the main file will be longer than in Krinkle’s proposition.

For the interface I thought about something like /load.php?debug=sourcemap&lang=fr&modules=ext.uls.common,init,interface&skin=vector&version=05ghpbi to keep the same stateless infrastructure. The downside is that the comment //# in the main file will be longer than in Krinkle’s proposition.

Well, the important thing is that the source map comment contain sufficient information to generate a source map that actually matches the code it's attached to :-).

I think this is already encompassed in this ticket but in case it's unclear, it would be wonderful to be able to actually serve source maps (.map extension) in production. Source maps must currently be renamed from *.map to *.json (see example in Popups and its referent task, T173491) which other tools may be unable to interpret.

File extensions are not supposed to be significant on the web, that's what Content-Type headers are for. While it's still very early days for Source Maps in RL, I would imagine that it will likely end up being served from load.php one way or another. This means it would use neither .map nor .json. The spec for Source Maps uses sourceMappingURL as the discovery mechanism.

Are there browsers or other developer tools that require the url to end in .map? That'd be good to know.

Are there browsers or other developer tools that require the url to end in .map? That'd be good to know.

Sorry, I'm not sitting on a good example. However, .map seems a conventional filename and I think it'd be the least surprising and most likely to work URI too.

Actually, I think one fair example is that Webpack itself defaults to .map extensions for source maps. Presumably most tooling is compatible with Webpack defaults.

Would you want the sources to be bundled in the source map, or linked? Or some combination?

A few things have changed since the start of this task.

In T47514#517096, @Krinkle wrote on 8 Mar 2014:

[…] it's slightly more elaborate to also maintain where the modules came from, to their respective individual non-concatenated files because when we're in the minifier, it's for the entire request as a whole. We'll have to maintain that state somehow and accumulate that context as we go on.

We used to build all module bundles (concat etc) and then feed the batch response to the minifier, at which point all a source map can do is offer the unminified code in an otherwise still combined payload without any module or file names for the virtually file tree that the browser devtools presents (it would all still be a single script with a long URL).

Since 2015, we minify each package on its own and then concatenate the result. History at T107377. This improved cache reuse and also means we potentially would no longer need a non-standard mechanism for building the source map. Assuming each chunk would come with a source map, we would then perform a "remap" step when combining the source code so as to logically combine the source maps as well. There is precedent for this in other source map generators and tooling around this.

Would you want the sources to be bundled in the source map, or linked? Or some combination?

I think bundled would work best and result in a notably faster and more stable debugging experience. If I understand correctly, we would otherwise have the source map refer to an "original" URL for each individually sourced file, and then refer to offsets in that file rather. When bundled, the original URL is for display only, which saves potentially thousands of web requests, and avoids any complication from the static files always being individually linkable from a public URL and for that to serve the correct version through caching layers, which has been a source of bugs and race conditions in the past.

I believe there are two levels of potential bundling/linking. The contents of the map (I'm suggesting bundling), and the reference to the map at the bottom of the debug mode load.php response, which also supports an embedded mode through a data URI. From https://sourcemaps.info/spec.html:

Note: <url> maybe a data URI. Using a data URI along with “sourcesContent” allow for a completely self-contained source-map.

The downside is that this would mean debug mode downloads large amounts of data by default even if you don't need it. This isn't a huge deal given debug mode is opt-in. But one of my aspirations with debug=2 (T85805) is that it would be production-like and fast enough that we developers could feel comfortable browsing the site with it on by default through some mechanism (we already support a cookie, a XWD-provided header could work as well). I think source maps would be a several orders of magnitudes less frequently used than debug mode itself. This is part of why this task has lingered. In practice, in debug mode you'll already have the source code and it's de-batched to one response per module. With unminified code and module name present, the value added by a virtual URL stating the exact file seems minimal. Getting it in one's editor would be competing with whether it's quicker to select and copy a portion of the filename like FooBar.js in the devtools sidebar and paste it in an editor's Quick Open (cmd+P) prompt, vs selecting the class name in the code in front of you (e.g. FooBar from class FooBar) and paste that in the same Quick Open prompt - which generally has the first result as the correct one and is what I do today.

All that to say, I think a completely self-contained source map, while benefitial in that it makes it stateless, would need to be opt-in I think (debug=sourcemap?) if we want to go that route. To offer it to the browser by default in debug mode, which seems nice, we should link it I think. Something like:

//# sourceMappingURL=load.php?only=sourcemap&modules=…
or
//# sourceMappingURL=rest.php/resourceloader/internal/sourcemap?modules=…

This has the downside that the link would need to be able to reproduce the response exactly. That should be fine in all but race conditions during a deployment, and we can verify it either way given "version" and "E-Tag" to help us confirm that we see the same result as what the caller saw a minute earlier.

I promised to link the popular JS ecosystem projects I know of that parse/minify/sourcemap JS code and are written in Rust or Go, and thus may be feasible to compile and expose to a native PHP extension:

There is an FAQ entry about linking Go projects to C which is quite discouraging. Rust may be possible. But adding source maps to our existing minifier is up my alley and should be a short (1-2 day) project for me. Contrary to what I said in our meeting, I don't think porting to another language is necessary to add that feature.

The part I'm most confident about is generating the "mappings" string, which would be implemented as a patch to the wikimedia/minify library. The mappings string would assume a single source file with index zero. The JSON wrapper would be ResourceLoader's responsibility. RL would be responsible for choosing source names and delivering maps. The RL component of this project would ideally be done by someone who is more familiar with RL.

Change 759437 had a related patch set uploaded (by Tim Starling; author: Tim Starling):

[mediawiki/libs/Minify@master] [WIP] JavaScript source map support

https://gerrit.wikimedia.org/r/759437

Assuming each chunk would come with a source map, we would then perform a "remap" step when combining the source code so as to logically combine the source maps as well. There is precedent for this in other source map generators and tooling around this.

The minifier could provide an incremental mode, allowing you to add chunks to a state object, or it could provide a batch mode, with an array of sources, or RL could combine single-file maps using an index map. In any case the RAM usage would be low since (unlike the libraries Seb35 linked to) my source map generator builds maps by string concatenation, it doesn't have a temporary structured store.

To offer it to the browser by default in debug mode, which seems nice, we should link it I think. Something like:

//# sourceMappingURL=load.php?only=sourcemap&modules=…
or
//# sourceMappingURL=rest.php/resourceloader/internal/sourcemap?modules=…

We could link it by default in non-debug mode. Appending that comment would be pretty cheap.

This has the downside that the link would need to be able to reproduce the response exactly. That should be fine in all but race conditions during a deployment, and we can verify it either way given "version" and "E-Tag" to help us confirm that we see the same result as what the caller saw a minute earlier.

I think it would be fine to have a source map link which makes a best effort to find matching sources, and if the source it comes up with doesn't match the hash in the URL, it can just respond with 404.

The wikimedia/minify patch is essentially complete and is now waiting for code review.

The next step here is to decide on how to integrate source maps into our response layer and module classes. Broadly speaking, Tim proposed these two approaches:

  1. A state object. In which to add chunks incrementally.
  2. A combinator or remapper. In which two or more pairs of (code,map) from the minifier can be combined.

A simplified model of how the code works today:

Simplified
class ResourceLoader {
 respond() {
   $output = '';
   foreach ( $modules as $module ) {
      try {
        $output .= $module->buildContent();
     } catch ( $e ) {
        // cancel this module's output, keep the rest
        $output .= '/* error */';
        break;
     }
  }
}

class Module {
  buildContent() {
    $script = cached( $module->getScript() ); // widely implemented stable interface
    if ( blue ) {
      return 'blue(' . $script . ');';
    } else {
      return 'green(' . $script . ');';
    }
  }
}

Option 1: State object

I believe this would mean we pass down the state object into the lower-level methods. And these then essentially write directly to the output buffer, equivalent to unbuffered echo more or less. This seems hard to pull off given the separation in responsiblities between producing the chunks of output, and deciding how and where these goes, including process-caching of some chunks as return values between that layer of separation.

We'll have to break some things, so that's fine. But the idea of separating these concerns seems a desirable quality to retain long-term. And I'd also like to at least explore the possibilility of rolling out source maps incrementally, thus would like to have a way to deal with a module not yet implementing its mapping.

If we have to do it all-or-nothing that's fine too, but I'd prefer to avoid it.

Option 2: Combine and/or remap

I believe this would mean the lower-level gets to keep its autonomy but has to pass upward its mappings along with the minified source, e.g. through a different return value, by-ref param, or something else.

The intermediary layers then either keep a flat array of such chunks and/or incrementally remap these with their own, depending on whether we want/have cases where a method doesn't know whether it's the final output.

This is the option I think would require the least changes, and work best in terms of the kinds of separation and benefits I think would help us long-term, and I think this would also make it relatively easy to deal with a chunk that doesn't have mappings available.

Change 759437 merged by jenkins-bot:

[mediawiki/libs/Minify@master] JavaScript source map support

https://gerrit.wikimedia.org/r/759437

Change 772878 had a related patch set uploaded (by Krinkle; author: Krinkle):

[mediawiki/core@master] [WIP] resourceloader: Implement experimental source map support

https://gerrit.wikimedia.org/r/772878

Change 762513 had a related patch set uploaded (by Krinkle; author: Krinkle):

[mediawiki/libs/Minify@master] tests: Add source map integration example for Node.js and browser

https://gerrit.wikimedia.org/r/762513

Change 762513 merged by jenkins-bot:

[mediawiki/libs/Minify@master] tests: Add source map integration example for Node.js and browser

https://gerrit.wikimedia.org/r/762513