Page MenuHomePhabricator

Update JavaScript syntax checker for gadgets and user-scripts for ES6 and later
Open, LowPublic

Assigned To
Authored By
Schnark
Nov 24 2014, 9:01 AM
Referenced Files
None
Tokens
"Like" token, awarded by Inductiveload."100" token, awarded by Avindra."Like" token, awarded by SlickKilmister."Like" token, awarded by AS."Like" token, awarded by DannyS712."Like" token, awarded by Silent."Like" token, awarded by Masumrezarock100."Mountain of Wealth" token, awarded by Daimona."Like" token, awarded by TerraCodes."Like" token, awarded by Shreyasminocha."Like" token, awarded by Vedmaka."Doubloon" token, awarded by Yair_rand."Like" token, awarded by MichaelSchoenitzer."Yellow Medal" token, awarded by Ricordisamoa.

Description

Steps to reproduce:

  1. Open your Special:MyPage/common.js for editing, add the code
var val, a = ['a', 'b'];
for (val of a) {
	console.log(val);
}
  1. Preview your change. The console should show "a" and "b" (given your browser support that much of ES6, but most browsers should, by now).
  2. Save the page.

Expected result: Same as step 2.
Actual result: The code from common.js is replaced with code that just throws an error:
Error: JavaScript parse error: Parse error: Unexpected token; token ; expected in file 'Benutzer:Schnark/common.js' on line 5

While for the site wide MediaWiki:Common.js etc. this is a good feature (to make sure nobody accidentally puts in some ES6 features only available in their browser but not in other users' browser), for personal JS a user should not be prevented from using features supported by their browser.

Status

"[As of April 2021] the JS syntax checker blocks use of ES6 in (i) gadgets, (ii) sitewide JS pages and (iii) user common.js page. "

Workaround

See T75714#4674421. Gist: deactivating the js validators is possible but has risks.

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

This task appears to be incorrectly named. At the moment the JS syntax checker blocks use of ES6 in (i) gadgets, (ii) sitewide JS pages and (iii) user common.js page. It is very much possible for user scripts to use ES6 and import them using user-level common.js page (or from gadgets or sitewide JS pages, for that matter). It's just that ES6 shouldn't directly appear in them. https://en.wikipedia.org/wiki/MediaWiki:Gadget-CommentsInLocalTime.js is an example of a gadget that uses ES6 by loading it from userspace.

In T75714#5752127, @Tgr wrote:

Maybe we could allow modern syntax and transpile it to a legacy version for old browsers (using Babel or something like that)?

(The other option would be transpiling statically, which is I think what the WMF Web team has adopted. For gadgets, that's probably not workable though.)

I think Esbuild could do a fine job here. It's easier to deploy than Babel (one flat binary).

FWIW, it's possible to load modern JS scripts that use ES modules with document.createElement. I've done this for my Commons categories pagination userscript. Of course this subverts the syntax checker, but it works™.

I think people also like to underestimate how much browser support there actually is for ES6. ALL major browsers (this includes mobile) have been shipping with full ESM support for over 2 years now

More productive use of our time, I think, however would be spent on the future of the web and not its past, such as using ES6 natively and raising our browser support accordingly.

This is not only something that would benefit the project itself, but It would open up the doors for hords of enthusiastic and skilled JS/ES devs who might be interested in contributing to this cause, but are effectively unable to do so because of the archaic conditions.

^ Strongly support these sentiments.

FWIW, it's possible to load modern JS scripts that use ES modules with document.createElement. I've done this for my Commons categories pagination userscript. Of course this subverts the syntax checker, but it works™.

You can use mw.loader.load to load non-gadget non-es5 user scripts (from a URL). You just can't use >es5 in ResourceLoader JS (including user and MediaWiki common.js and skin.js files and gadgets).

https://github.com/mck89/peast looks like the only php library available that can validate JS syntax with modes for different ES versions. However, the benchmarks are:

        JSParser::parse (jquery-3.2.1.js.gz)    Peast\Peast::ES6::parse (jquery-3.2.1.js.gz)
 count:          4                                           4
  rate:      0.2/s                                       0.1/s
 total: 19341.17ms                                  40252.85ms
  mean:  4835.29ms                                  10063.21ms
   max:  5367.42ms                                  10615.39ms
stddev:   355.03ms                                    471.19ms
Current memory usage: 34.00 MiB                      34.00 MiB
 Peak memory usage:   34.00 MiB                      58.01 MiB

which is disappointing – 2x slower than JSParser (JSMinPlus, the current validator).

There's also https://github.com/szaboferee/esprimaPHP though I couldn't figure out how to run it, but is anyway not very useful as it doesn't offer a way validate without considering ES7+ syntax.

@Catrope Would it be feasible to enhance the JavaScriptMinifier to always throw on invalid inputs, making it possible to use that as a validator? I see a number of // TODO: This is invalid and should throw in the code.

I also took a look at some node.js tools. There's es-check – a validator based on the widely used acorn parser. Unsystematic metrics (not generated using the Benchmarker class) indicate it takes ~0.5 seconds to check for jquery's ES5 and ES6 compatibility (10x faster!). We could have a feature flag that enables shelling out to es-check, while retaining JSMinPlus as a fallback for the sake of restricted environments.

@Catrope Would it be feasible to enhance the JavaScriptMinifier to always throw on invalid inputs, making it possible to use that as a validator? I see a number of // TODO: This is invalid and should throw in the code.

It would be a lot more work than it seems. Those "this is invalid" comments are about unclosed quoted strings, but that's just the tip of the iceberg. Everything in the minifier aggressively assumes that the input is valid. You could catch some invalid code by throwing when a state machine transition is missing, but I think you'd also have to dramatically expand the size of the state machine, because right now it merges together multiple tokens into the same state that kind of behave the same syntactically but aren't actually interchangeable. For example, it considers break, continue, return and throw to all be basically the same, and they do behave very similarly syntactically, but return is valid outside of a loop whereas break isn't. Worse, it groups else, do, case, try and finally together, but the contexts in which it's legal to use those keywords are very different, and what's allowed to come after them is different too.

A hacky alternative to provide ES6 support in gadgets would be:

  • Introduce a new reserved group which disables validation.
  • Gadgets which mark themselves as es6-only are put in that group - so they are retrieved in a separate request than other gadgets and core/extension-loaded JS.

This means that if any one of these ES6-only gadgets contains a syntax error, all other enabled ES6-only gadgets would also fail, but core/extension-loaded modules and non-ES6 gadgets would work as usual. That seems like a reasonable compromise. Thoughts?

Since MediaWiki 1.36 (or in the case of Wikimedia wikis, rMWb267f7aa9075: resourceloader: Allow modules to mark themselves as ES6-only which was merged last March) ResourceLoader does support ES6. Is anything needed here beyond exposing the es6 ResourceLoader flag to gadgets?

In T75714#7647489, @Tgr wrote:

Since MediaWiki 1.36 (or in the case of Wikimedia wikis, rMWb267f7aa9075: resourceloader: Allow modules to mark themselves as ES6-only which was merged last March) ResourceLoader does support ES6. Is anything needed here beyond exposing the es6 ResourceLoader flag to gadgets?

ResourceLoaderModule->validateScriptFile uses the JSMin+ JSParser to validate modules. Only ResourceLoaderWikiModule uses this validator to validate JS before serving it, ResourceLoaderFileModule does not. JSMin+ has not been updated since 2011 and does not support ES6, so the validator will have to be replaced before ResourceLoaderWikiModule can support ES6.

So can we just disable that validation for ES6 gadgets? I assume it's there to protect other modules when they are concatenated in a single bundle; can ES6 gadgets be a separate resource group? An extra request seems like an acceptable price to pay for allowing the extra functionality. Syntax errors in an ES6 gadget would still break the other ES6 gadgets; that also seems like an acceptable trade-off.

In T75714#7647547, @Tgr wrote:

So can we just disable that validation for ES6 gadgets? I assume it's there to protect other modules when they are concatenated in a single bundle; can ES6 gadgets be a separate resource group? An extra request seems like an acceptable price to pay for allowing the extra functionality. Syntax errors in an ES6 gadget would still break the other ES6 gadgets; that also seems like an acceptable trade-off.

This is what SD0001 suggested just above in T75714#7637508.

Ah, sorry, I missed that comment somehow.

Change 758085 had a related patch set uploaded (by SD0001; author: SD0001):

[mediawiki/core@master] resourceloader: add resource group for ES6 on-wiki scripts

https://gerrit.wikimedia.org/r/758085

Change 758086 had a related patch set uploaded (by SD0001; author: SD0001):

[mediawiki/extensions/Gadgets@master] Add support for ES6 gadgets, but with validation disabled

https://gerrit.wikimedia.org/r/758086

I think Esbuild could do a fine job here. It's easier to deploy than Babel (one flat binary).

I came here to post this thought and saw that @Avindra had already done so.

I think that we should look seriously at using ESBuild (written in Go, very fast, can be distributed as a flat binary, actively developed and understands ES6, etc) for all of our JS parsing / minifying / validating (?) needs in MediaWiki. It seems like the right tool for the current era of web development.

Furthermore, because it's fast and doesn't require adding a Node.js runtime to MW, ESBuild could also pave the way for some future enhancements we have been thinking about, like supporting some kind of build-step for front end code (so that developers can use Typescript, or do tree-shaking), or server-rendering Vue components (we might have to write some plugins to do this).

The only thing ESBuild won't get us is HMR during development. Vite, which is built on top of ESBuild, does provide this capability (and also has first-class Vue support) but then we're re-introducing a Node runtime dependency, so it's a bit of a trade-off.

Anyway, I think coming up with an adequate solution to this problem (support gadget authors who want to use ES6) could also tie into some other related problems in terms of supporting modern JS across our infrastructure.

Change 758085 abandoned by SD0001:

[mediawiki/core@master] resourceloader: add resource group for ES6 on-wiki scripts

Reason:

Core change not needed per above

https://gerrit.wikimedia.org/r/758085

ResourceLoaderModule->validateScriptFile uses the JSMin+ JSParser to validate modules. Only ResourceLoaderWikiModule uses this validator to validate JS before serving it, ResourceLoaderFileModule does not.

Can we remove that and validate on save instead? Validating on asset serving is performance-sensitive, validating on saving is not (well, not so much) so it shouldn't be a problem for MediaWiki to shell out. The validation result could e.g. be written to a derived MCR slot (that way the validation doesn't even have to block the save, although maybe for usability purposes it should), and ResourceLoaderWikiModule would check that slot instead of its normal validation process when it sees the es6 flag.

For surfacing validation information on save, see the parent task, e.g. T76204#3895772.

I agree validating on save would make sense in the long term. Though I'd lean toward treating it as derived data, e.g. saved to a page property reflecting the page's current state, rather than as indefinitely kept content slot programmatically generated and inserted into each revision, with then needing to fix or prevent diff, edit, source, etc. If the parent task is solved, we'd need neither, or perhaps just a constant 1 saved as page prop to catch any cases of the content not having been saved with validation (page moves, content model changes, rollbacks, imports, etc) with runtime fallback to lazy-validation and queuing a refresh job.

Shelling out could work if we e.g. offer some kind of hook to override it in a WMF-specific way, unless we can ship the binary with core such that it is usable across platforms. I doubt we need to shell out, though. I haven't looked into Peast much (mentioend by @SD0001) but that looks to be in quite good shape as a default for use on-save. Given how widely used PHP is, and how widely used JavaScript is, there are bound to be more. If we really can't find a decent parser, we ought to hire someone to connect the available dots and open-source it. It'd be a worthwhile contribution I think. There are plenty of mature and well-maintained language-neutral or near-neutral implementations to tokenize and parse/validate modern JavaScript that could be ported, or generated through something like PEG.

For production, we might also be able to bind a native implementations to PHP, e.g. swc_ecmascript (Rust library), which is used by Deno.

In T75714#7699864, @Tgr wrote:

Can we remove that and validate on save instead?

Something to take into account: If the validator/parser changes in the future, validating when serving (not saving) the contents ensures it will validate everything already written against the new version. This may be a feature in itself: Something valid today may become invalid tomorrow and silently break (a new validator may be more strict). On the other hand, if a security vulnerability is discovered and patched in the validator, it can sanitize already existing code.

This feature can be mimicked by creating a background job that is triggered when running update.php to validate existing pages.

I think validating on save and saving the result as a page property is a pretty good approach.

This may be a feature in itself: Something valid today may become invalid tomorrow and silently break

That sounds highly unlikely as new ES versions are always back-compatible with older ones.

if a security vulnerability is discovered and patched in the validator, it can sanitize already existing code.

The validator does not produce any code. It merely gives a true/false result indicating whether the given code is valid or not.

A hacky alternative to provide ES6 support in gadgets would be:

  • Introduce a new reserved group which disables validation.
  • Gadgets which mark themselves as es6-only are put in that group - so they are retrieved in a separate request than other gadgets and core/extension-loaded JS.

This means that if any one of these ES6-only gadgets contains a syntax error, all other enabled ES6-only gadgets would also fail, but core/extension-loaded modules and non-ES6 gadgets would work as usual. That seems like a reasonable compromise. Thoughts?

This sounds like a good compromise. Considering the # of subscribers and # of tokens, seems like there is a desire for something. You wrote a patch for this in January 2022 right?

https://gerrit.wikimedia.org/r/758086

I've done some work on gadgets, and it would be nice to have an option besides ES5 and transpile. Both of those have some drawbacks.

Change 758086 merged by jenkins-bot:

[mediawiki/extensions/Gadgets@master] Add support for ES6 gadgets, but with validation disabled

https://gerrit.wikimedia.org/r/758086

I started updating the documentation at https://www.mediawiki.org/w/index.php?title=Extension:Gadgets&diff=5557485&oldid=5495165. I left a couple table cells blank. Feel free to fill them in.

I was looking into this task as part of T334438: Create a basic "Hello world" example of how to use Codex in a user script, and I think that we may need some solution that can work for userscripts as well – unlike Gadgets, it seems that there is no way to disable validation for userscripts.

What is more, some ES6 syntax (like const) seems to be allowed, while other things like arrow functions will still cause validation to fail.

For example, the following code will work on a user-level common.js page:

mw.loader.using( [ 'vue', '@wikimedia/codex' ], function ( require ) {
	const Vue = require( 'vue' );
	const codex = require( '@wikimedia/codex' );
	
	console.log( Vue );
	console.log( codex );
} );

This code will not work:

mw.loader.using( [ 'vue', '@wikimedia/codex' ], ( require ) => {
	const Vue = require( 'vue' );
	const codex = require( '@wikimedia/codex' );
	
	console.log( Vue );
	console.log( codex );
} );

I wonder if we should seriously consider providing a way for MW to shell out to a more modern Node.js-based validator that supports current language features. This is another area where doing the work outlined in T328699: Consider including a JS runtime as part of MediaWiki could be beneficial.

Both should work fine in User:Example/hello-vue.js which you can import from common.js with e.g. importScript("User:Example/hello-vue.js").

Most of the time you should move scripts to separate files anyway to make them more shareable.

As noted at T178356#8782595, all mainstream browsers passing current check should support ES8 (and most of ES9 except some regex-related features).

Bugreporter renamed this task from Update JavaScript syntax checker for gadgets and user-scripts for ES6 to Update JavaScript syntax checker for gadgets and user-scripts for ES6 and later.Apr 14 2023, 7:03 PM

What is more, some ES6 syntax (like const) seems to be allowed, while other things like arrow functions will still cause validation to fail.

const was a reserved keyword back when so that's why it works, however, let (and arrow functions as you mentioned) won't work.

@Krinkle recently told me that he found https://github.com/mck89/peast/ , an ES2023 parser for PHP that can be configured to parse any ES2015 (=ES6) or any later version. This seems like a promising candidate to replace the old ES5 validator with.

Change 927797 had a related patch set uploaded (by SD0001; author: SD0001):

[mediawiki/core@master] resourceloader: Post-save validation for onwiki JS

https://gerrit.wikimedia.org/r/927797

Have uploaded a patch for moving the validation to a post-save update (was meant to be a POC patch, but ended up being pretty functional). This should enable us to use slower validators like Peast (T75714#7607131). It also introduces a JavaScriptValidator interface so that the implementation can be swapped for another one (such as one based on Rust or JS for WMF production).

@Legoktm pointed out that page properties are an artifact of parsing and should only be modified within the parse, so instead of validating post-save we'd have to validate synchronously during the save – which I suppose is just about fine. The other option would be to do it post-save but store the result in some cache or a new db table.

@Legoktm pointed out that page properties are an artifact of parsing and should only be modified within the parse, so instead of validating post-save we'd have to validate synchronously during the save – which I suppose is just about fine. The other option would be to do it post-save but store the result in some cache or a new db table.

I left some code review comments at https://gerrit.wikimedia.org/r/c/mediawiki/core/+/927797/ in which I arrived at the same conclusion. Thanks for working on this! Let me know if you have time in the next few weeks to iterate on this with us, or if you'd like someone to help finishing the patch on your behalf.

Change 959373 had a related patch set uploaded (by Krinkle; author: Krinkle):

[mediawiki/vendor@master] Initial audit of mck89/peast package

https://gerrit.wikimedia.org/r/959373