Page MenuHomePhabricator

Consider a pipeline for enhanced minification (e.g. support UglifyJS)
Open, Stalled, LowPublic

Description

Right now we use a very basic but fast minifier. It has to perform very well due to the way we do on-demand package generation[1] whilst having a very high cache hit ratio.

Though this is nice, it drastically limits our options and ability to implement additional features.

Three features in particular:

  • Implementing source maps[2] for easier debugging. At the moment with our basic minification enabling "Prettification" in Chrome Dev Tools makes the debugging experience "Okay" to deal with, but it is still all squashed into one file (doesn't map to original file names). When we do even more advanced minification this becomes even more important.
  • Conditional code / stripping blocks. One of the things more sophisticated minifiers are capable of is stripping dead code. Aside from the obvious rare case of consistently unreachable code (which should just be wiped from the code base), this is useful for debugging purposes. See also T39763. Right now we have very few mw.log calls. I believe we avoid these because they take up space. Though they are a no-op in production mode (the log method is an empty function by default, in debug mode we load the actual module that populates the method. So it isn't that they would pollute the console in production, but that they take up JavaScript code. By putting them in something like if (MW_DEBUG) { mw.log(...); } we can have them be stripped by UglifyJS in production and preserve them in debug mode by predefining a global constant MW_DEBUG set to true or false respectively in UglifyJS.
  • Better minification: variable name changes, optimising for gzip, optimising statement to be shorter notation etc. [3]

So that's all great, but the problem is that, though UglifyJS[4] (for example) is getting faster, it is still much too slow to run on many files at once on-demand from the web server.

Last February when I was in San Francisco, Roan and I have been thinking about something. I recall the following, though Roan might have a better version of this:

  • We'd run the quick minifier on cache miss to populate the cache quickly and respond to the request. Then enqueue a job to run the advanced minifier (asynchronously).
  • The job queue will then run the elaborate minification process and replace the cache item. We don't have to worry about the possibility of overwriting a new version with a new version because the cache keys contain a hash of the raw contents, so worst case scenario we're saving something that won't be used.

There's 2 details in particular I'm not sure about:

  • How do we deliver them to the client? We have unique urls with version timestamps.
  • The only way to trigger a purge is to either keep track of all urls in varnish that contain the module name and order a purge in varnish (after we update memcached, of course, so it'd be a quick roundtrip to Apache to compose a response from cached components)
  • Or alternatively, cause a version bump in the module (touch() the files)
  • The job queue, we can enqueue generic jobs that check everything. Or enqueue a job per cache item. In either case we need to account for the case that the enqueued job is no longer needed by the time it runs (in case we use generic jobs, once the first one runs, it should cancel any other ones in the queue, in case of module or item specific jobs cancel any for the same).

And then there is the question of how getting the javascript code and nodejs deployed and execute it from php. Installing nodejs on every apache and shelling out is probably not a good idea. Alternatively we could wrap it in a priviate service (like Parsoid), so we set up a few of them in the bits cluster and PHP would open a socket or HTTP request and POST or stream input and get output back.

Details

Reference
bz47437

Event Timeline

bzimport raised the priority of this task from to Low.Nov 22 2014, 1:21 AM
bzimport set Reference to bz47437.
bzimport added a subscriber: Unknown Object (MLST).
Krinkle created this task.Apr 19 2013, 11:31 PM

For your first point, you can append debug=true to each url you want to debug or for an own wiki, you can set $wgResourceLoaderDebug to enable default debug=true, than the minifier is not processed and each javascript is shipped in its own file.

For the other point, you wrote on other bugs

Marking bugs that suggest altering the token stream in JavaScriptMinifier as
wontfix.

This enhancement sounds also like altering the token stream with a different (hopfully optional) technic.

This change is in harmony of the design and requirements in ResourceLoader where any compression and packaging is only for improvements. Everything needs to have a raw mode still and raw mode should not introduce problems.

The default minifier will be used still. And for some environments (those that can't install things on the server) will likely stick to just the default minifier.

The extra minification is an optional enhancement. Optional isn't the right description in my opinion as it won't use it instead of the default minifier, but on top of the default minifier. Though it won't take the default minifier's output as input, it will run alongside from an asynchronous job queue. So the first cache miss will be responded to with the (current) basic minifier.

The extra features regarding conditional code (if MW_DEBUG: mw.log) degrades gracefully as it is still valid javascript and the MW_DEBUG variable will simply be set to false. and mw.log already has a no-op dummy in non-debug mode.

Krinkle renamed this task from ResourceLoader: Implement support for enhanced minification (e.g. support UglifyJS) to Implement pipeline for enhanced minification (e.g. support UglifyJS).Nov 26 2014, 12:33 AM
Krinkle updated the task description. (Show Details)
Krinkle set Security to None.
Krinkle removed a subscriber: Unknown Object (MLST).
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptSep 4 2015, 12:04 AM
Krinkle added a subscriber: MaxSem.Sep 8 2015, 5:07 AM
Krinkle changed the task status from Open to Stalled.Nov 4 2015, 5:06 PM

Pending new ideas for implementation, marking as stalled due to infrastructure constraints that make this infeasible at the moment.

  • The average gain of UglifyJS2 over JavaScriptMinifier (after gzip) is about 10%.
  • The latency to Node service behind LVS and Varnish and the UglifyJS processing time is too much to be acceptable for load.php response times. So until those performance characteristics change, this can't be implemented in a straight-forward way.
  • Trying to do this asynchronously and serving APC-cached JavaScriptMinifier content in the mean time seems attractive, but so far we haven't figured out a good way to do this because we'd need to purge the Varnish hits. Otherwise the UglifyJS response would never be used as after the first backend hit, that response will unlikely be sent to a backend again (until the code changes, or when it drops out of cache after 30 days). Purging Varnish urls is non-trivial due to them being in unpredictable batches and having version parameters. And besides, letting them route to the backend more often would sacrifice latency that may be more valuable than bandwidth in some cases.
MaxSem added a comment.Nov 4 2015, 6:50 PM

As the matter of fact, as explained in RFC, you can improve cache hit rate significantly, which would help both PHP and external minification:

  • Make cache keys for minified content global
  • Minify and cache every module separately, as opposed to lumping everything requested together
  • Minify only the actual JS, not JS with messages/config/whatever data

This way, actual minification will happen very rarely as opposed to the current situation where logged in users who have changed their preferences have a good chance that their load.php requests will result in new minification.

As the matter of fact, as explained in RFC, you can improve cache hit rate significantly, which would help both PHP and external minification:

Cache hit and response latency are orthogonal requirements.

  • Make cache keys for minified content global
  • Minify and cache every module separately, as opposed to lumping everything requested together
  • Minify only the actual JS, not JS with messages/config/whatever data

The current infrastructure already does all the above.

This way, actual minification will happen very rarely as opposed to the current situation where logged in users who have changed their preferences have a good chance that their load.php requests will result in new minification.

This is not true. I also don't see how user preferences are related to whether load.php requests will cache hit or not.

  • load.php has always been cookieless. Anons and logged-in users operate without sessions here and enjoy Varnish caching.
  • Aside from gadgets and one or two rare preferences (like double click to edit) none of our preferences change what modules are loaded on a page. A much more significant factor to module-batch variation is the wiki page. Not the user. E.g. pages with syntaxhighlight have that extra module in their batch. Login page and edit page have different modules etc. It is not uncommon when browsing from one page to another page that the module queue is different.
  • On first cold browser cache visit (regardless of logged-in or not) the browser populates local caches using the top and bottom queue load.php requests. Different wiki page may have different modules in these requests, but they always contain the base modules. All modules are expanded and cached client-side on a pre-module basis in localStorage.
  • Any secondary page view only uses the network for modules not yet loaded in a previous page view, or if the individual module changed.
He7d3r updated the task description. (Show Details)Oct 23 2016, 6:22 PM
He7d3r added a subscriber: He7d3r.

Did someone run a similar check for webpack similar to UglifyJS? MobileFrontend checks in the the bundled and minified codes coming from webpack (which can not be done for Common.js or gadgets code) but it's worth checking.

Restricted Application added a project: Performance-Team. · View Herald TranscriptMon, Jul 29, 1:23 PM
Krinkle renamed this task from Implement pipeline for enhanced minification (e.g. support UglifyJS) to Consider a pipeline for enhanced minification (e.g. support UglifyJS).Mon, Jul 29, 8:07 PM