Page MenuHomePhabricator

JS minification still outputs incorrect javascript for some input
Closed, ResolvedPublic

Description

Example:
Input: http://de.wikipedia.org/wiki/Benutzer:Tom_md/monobook.js
Output: http://bits.wikimedia.org/de.wikipedia.org/load.php?debug=false&lang=de&modules=user&only=scripts&skin=monobook&user=Tom_md

Right at the beginning there are some comments that are not deleted (this doesn't hurt, of course, but obviously something goes wrong).

Around line 435 (var Quickbar) everything seems fine again, but on line 473 (in function qbWPIntern()) some comments are preserved and then in line 609
w (1,'http://commons.wikimedia.org/wiki/Special:Upload','C-Up',qbtarget,'Commons-Upload');
(note that the quotes change from double to single here)
is destroyed to
w (1,'http:
which causes a syntax error.
Please note that this javascript is the most used javascript in the German Wikipedia.


Version: 1.17.x
Severity: normal

Details

Reference
bz27528

Event Timeline

bzimport raised the priority of this task from to High.Nov 21 2014, 11:21 PM
bzimport set Reference to bz27528.

salvatore.ingala wrote:

In the previous report, please refer to http://it.wikipedia.org/w/index.php?title=MediaWiki:Vector.js&oldid=38576973 since code has been modified.

When JavaScriptDistiller will be rewritten, please note that it's not trivial to safely minify js when allowing arbitrary user input. That's because of the division/regex conflict in javascript's grammar, which can not be resolved only by looking at the previous token:

(a+b)/(c+d)/g division operator
if(1)/a /g.exec('Pa ss');
regex

Althought this might seem to be edge cases, considering the amount of javascript being used on the different projects, it is likely that they will be hit eventually.

So to do minifying without breaking correct javascript, you'll need to do basic stack parsing on the javascript. I've written sth. similar in C a while ago, so if someone is willing to review or use it, I can try to port it to php and post it here.

This should be fixed now. Closing as FIXED, please REOPEN if new problems arise.

Reopening per my comment above. As pointed out, it's not possible to minify js with simple regexes without breaking user code in some cases.

FTR here are two sample test cases that currently break:

echo JavascriptDistiller::stripWhiteSpace("alert( (10+10) / '/'.charCodeAt( 0 ) ) + '//' );\n");

alert((10+10)/ '/'.charCodeAt( 0 ) ) + '

echo JavascriptDistiller::stripWhiteSpace("if(1)/a /g.exec('Pa ss');");

if(1)/a/g.exec('Pa ss');

(In reply to comment #5)

Reopening per my comment above. As pointed out, it's not possible to minify js
with simple regexes without breaking user code in some cases.

FTR here are two sample test cases that currently break:

echo JavascriptDistiller::stripWhiteSpace("alert( (10+10) / '/'.charCodeAt( 0 ) ) + '//' );\n");

alert((10+10)/ '/'.charCodeAt( 0 ) ) + '

This has unbalanced parentheses, but the issue you point our (division interpreted as start of regex, string vs. not string swapped for the rest of the input) is valid.

echo JavascriptDistiller::stripWhiteSpace("if(1)/a /g.exec('Pa ss');");

if(1)/a/g.exec('Pa ss');

This one is so devious that even Douglas Crockford's JSMin falls for it :O

(In reply to comment #3)

So to do minifying without breaking correct javascript, you'll need to do basic
stack parsing on the javascript. I've written sth. similar in C a while ago, so
if someone is willing to review or use it, I can try to port it to php and post
it here.

I'm interested. I'm also willing to port it to PHP myself if you give me the C code and release it under a free license.

New minifier which takes the division/regexp conflict into account

(In reply to comment #6)

This has unbalanced parentheses, but the issue you point our (division

Copy'n'Paste fail. Sorry for that :)

This one is so devious that even Douglas Crockford's JSMin falls for it :O

Right, JSMin makes a fair amount of assumptions about the code that it will minify.

I'm interested. I'm also willing to port it to PHP myself if you give me the C
code and release it under a free license.

Well, usually it's easier to port code that you wrote yourself, so I went ahead and made the attached patch. It seems, that even in php, its performance is quite good, in most tested cases it was about the same speed or slightly faster than JavascriptDistiller.

For now I removed $wgResourceLoaderMinifyJSVerticalSpace. As the new minifier knows when semicolon insertion will be triggered, it can safely put everything on a single line. For readability it may be considered to add a few newlines, though, maybe before each "function"?

I left JavaScriptDistiller.php in place to ease comparing and testing. I also added a few test cases in tests/phpunit/includes/libs/JavaScriptMinifierTest.php, probably need a few more, though.

attachment bug27528.patch ignored as obsolete

Modified patch: Take html comments into account, cleaned up a bit

So, poking a little bit more at this I fired up a test script that pulls user scripts from enwiki and checks them against JSLint http://www.javascriptlint.com/ for syntax errors, both before and after minifying.
After I had discovered and fixed an issue with html comments (see modified patch), I ran the script again and got the results below:

| Total | Distiller |    Distiller    |  Minifier
|       |           | (stripVertical) |

--------------------+-------+-----------+-----------------+------------
Checked pages | 39313 | | |
--------------------+-------+-----------+-----------------+------------
Error free pages | 34623 | | |
--------------------+-------+-----------+-----------------+------------
Pages with syntax | | | |
errors after minify | | 61 | 79 | 0
--------------------+-------+-----------+-----------------+------------
Avg. length (bytes) | 2875 | 2124 | 2085 | 2065
--------------------+-------+-----------+-----------------+------------
Avg. time (ms) | | 7.0 | 8.9 | 5.0

Attached:

List of enwiki user scripts which are currently broken

Attached:

I had a quick look at some of the broken scripts and all had a /\/foo\// in them. Especially in [[User:MBisanz/monobookrfa.js]] there isn't much other code that could break. I didn't check if that's the reason why they break, but if so that must be fixed of course.

(In reply to comment #7)

For now I removed $wgResourceLoaderMinifyJSVerticalSpace. As the new minifier
knows when semicolon insertion will be triggered, it can safely put everything
on a single line. For readability it may be considered to add a few newlines,
though, maybe before each "function"?

I'd rather keep the existing behavior of putting each statement on its own line.

Thanks for running these tests and for contributing your minifier. I'm reviewing it now.

Patch committed in r83885, some tweaks made in r83891

(In reply to comment #13)

Where are the tests?

/trunk/phase3/tests/phpunit/includes/libs/JavaScriptMinifierTest.php (added)

in r83885

Follow-up fix: don't insert newlines before increment operators

(In reply to comment #12)

Patch committed in r83885, some tweaks made in r83891

Thanks a lot. I really appreciate this being included in MediaWiki.

Just a small fix concerning the addition of the $maxLineLength parameter: There is one case in the js grammar, where a newline is forbidden, although semicolon insertion is not triggered:

if(x
++);

This raises a syntax error, because a newline may not occur before a postfix increment operator. So we have to check for that too, when inserting a newline.

Fix with regression tests attached.

Attached:

(In reply to comment #15)

Thanks a lot. I really appreciate this being included in MediaWiki.

You should apply for commit access :)

Just a small fix concerning the addition of the $maxLineLength parameter: There
is one case in the js grammar, where a newline is forbidden, although semicolon
insertion is not triggered:

if(x
++);

This raises a syntax error, because a newline may not occur before a postfix
increment operator. So we have to check for that too, when inserting a newline.

Fix with regression tests attached.

Thanks! Applied in r83934 with a small comment tweak.

neilk wrote:

FYI, \f and \v aren't valid escapes in PHP < 5.2.5.

We run PHP 5.2.4 in production. So, it's fixed in our code to \xb and \xc.

http://www.mediawiki.org/w/index.php?title=Special:Code/MediaWiki/83998
http://www.mediawiki.org/w/index.php?title=Special:Code/MediaWiki/83997

P.Copp -- you may want to backport that, or make note of a minimum PHP version to run.

(In reply to comment #17)

FYI, \f and \v aren't valid escapes in PHP < 5.2.5.

We run PHP 5.2.4 in production. So, it's fixed in our code to \xb and \xc.

http://www.mediawiki.org/w/index.php?title=Special:Code/MediaWiki/83998
http://www.mediawiki.org/w/index.php?title=Special:Code/MediaWiki/83997

P.Copp -- you may want to backport that, or make note of a minimum PHP version
to run.

Nice catch! That's what makes PHP so much fun to code in :)
Anyway, thanks for the heads up.