Page MenuHomePhabricator

Update JavaScript syntax checker for gadgets and user-scripts for ES6
Open, LowPublic

Assigned To
None
Authored By
Schnark
Nov 24 2014, 9:01 AM
Tokens
"Like" token, awarded by Inductiveload."100" token, awarded by Avindra."Like" token, awarded by SlickKilmister."Like" token, awarded by AS."Like" token, awarded by DannyS712."Like" token, awarded by Silent."Like" token, awarded by Masumrezarock100."Mountain of Wealth" token, awarded by Daimona."Like" token, awarded by TerraCodes."Like" token, awarded by Shreyasminocha."Like" token, awarded by Vedmaka."Doubloon" token, awarded by Yair_rand."Like" token, awarded by MichaelSchoenitzer."Yellow Medal" token, awarded by Ricordisamoa.

Description

Steps to reproduce:

  1. Open your Special:MyPage/common.js for editing, add the code
var val, a = ['a', 'b'];
for (val of a) {
	console.log(val);
}
  1. Preview your change. The console should show "a" and "b" (given your browser support that much of ES6, but most browsers should, by now).
  2. Save the page.

Expected result: Same as step 2.
Actual result: The code from common.js is replaced with code that just throws an error:
Error: JavaScript parse error: Parse error: Unexpected token; token ; expected in file 'Benutzer:Schnark/common.js' on line 5

While for the site wide MediaWiki:Common.js etc. this is a good feature (to make sure nobody accidentally puts in some ES6 features only available in his browser but not in other users' browser), for personal JS a user should not be prevented to use whatever features his browser is able to provide.

Status

"[As of April 2021] the JS syntax checker blocks use of ES6 in (i) gadgets, (ii) sitewide JS pages and (iii) user common.js page. "

Workaround

See T75714#4674421. Gist: deactivate the js validators is possible but has risks.

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

(The other option would be transpiling statically, which is I think what the WMF Web team has adopted. For gadgets, that's probably not workable though.)

Well, that’s what I currently do for the AC/DC gadget. I might be the only one, though :)

Although setting the $wgResourceLoaderValidateJS to false is too dangerous, why not add a feature to support validating es6 or higher version JS for the site which dont care about IE visitor?

@AnnAngela This task is not blocked on deciding how to implement it. It is blocked on the cost of implementing the feature in question. That will require an engineer to spent significant amounts of time to work on that. That engineer is likely to be me at some point. I have other priorities at the moment. This will not become my priority until after it becomes useful for Wikipedia to enable such feature.

Krinkle renamed this task from ResourceLoader JavaScript parser should allow ES6 syntax features to Allow ES6 syntax features in gadgets and other site-wide scripts.Jun 16 2020, 3:39 PM
Jdlrobson added a subscriber: Jdlrobson.

FYI gadget developers are working round this by loading the JS raw [1] and there seem to be a large amount of errors coming from gadgets where users are trying to load gadgets that are written in ES6 and their browser is incapable of viewing so this is a concern from the production error logging we are rolling out.

Two particular examples:
https://zh.wikipedia.org/wiki/MediaWiki:Gadget-confirm-logout.js
and
https://ru.wikipedia.org/w/index.php?title=%D0%A3%D1%87%D0%B0%D1%81%D1%82%D0%BD%D0%B8%D0%BA:%D0%94%D0%B8%D0%BC%D0%B074/yoficator.js

In the case of the former, I had to patch it with feature detection as it was one of our top errors as it was being loaded as a site wide gadget.

Allowing gadget developers to run ES6 is helpful, but we should have better protection for users who opt into them without knowing their browser doesn't support them.

Possibly adding support inside mw.loader.load for checking ES6 support might help here. Could mw.loader.load calls be restricted to ES6 only browsers or catch and report such errors to end users via mw.notify?

[1] e.g. https://global-search.toolforge.org/?q=yoficator.js&regex=1&namespaces=&title=

AS added a subscriber: AS.

This task is not a production error. More generally, code from gadgets are outside the scope of Wikimedia-production-error.

If RL could minify ES6, it would still crash if served to ES5 browsers. This task is about supporting minification for user scripts that want to use ES6.

I refer to T262493 for the general issue of not being distracted in Logstash by community choices in user scripts.

Allowing gadget developers to run ES6 is helpful, but we should have better protection for users who opt into them without knowing their browser doesn't support them.

We don't right now. The code in question ceased to be a gadget to moment it became a raw-loaded plain text file loaded over HTTP outside MW or RL. When enabling a gadget, it is expected to work if the option is presented. We have skin filters for gadgets that want to limit their support. One could file a feature request to the Gadgets extension if you think we shoul also fragment that by browser support.

More productive use of our time, I think, however would be spent on the future of the web and not its past, such as using ES6 natively and raising our browser support accordingly. The bifurcation we see today is short-lived and unlikely to happen again any time soon. It's recognised as a mistake the way ES6 jumped ahead of reality and turned developers against their own rational principals. With more iterative and polyfillable changes going forward, this won't be neccecary. (One could also argue, that any useful features in ES6 were already polyfillable and that the syntax is insignificant.)

See also the direction of global gadgets more generally, which if placed in Git, would allow for more normal development tooling, including for example to transpile from other syntaxes. As I understand it, most well-maintained gadgets naturally are already maintained outside MW as developing a code base of any meaningful complexity is not feasible in the form of wiki pages and no tooling, CI, or staging ability. Hence why this and related tasks have been fairly low priority.

I think people also like to underestimate how much browser support there actually is for ES6. ALL major browsers (this includes mobile) have been shipping with full ESM support for over 2 years now, and old browsers aren't common and are becoming rarer by the day. As long as old browsers are considered so that the webpage doesn't break and has basic functionality, is it really necessary to spend so much effort in trying to give them ALL the same features? At this point, is it really any different from someone who disabled JavaScript in their browser? Everyone has the ability to get an up-to-date browser, they are really making their own bed and should carry the consequences themselves.

More productive use of our time, I think, however would be spent on the future of the web and not its past, such as using ES6 natively and raising our browser support accordingly.

This is not only something that would benefit the project itself, but It would open up the doors for hords of enthusiastic and skilled JS/ES devs who might be interested in contributing to this cause, but are effectively unable to do so because of the archaic conditions. I would love to do work related to scripts and gadgets, but the effort required is insane for me. I recently made a simple script to hide a few boxes that were bothering me; what would usually take me around 5 minutes to do ended up taking over 1 hour....

I started working with JS/ES a bit over 2 years ago, and have been using TypeScript exclusively for about 1 year. I have zero experience with working in a world without Intellisense and easy to use automation. And looking at the popularity of JS/ES and TypeScript in OSS, I can't be the only one who was discouraged by this to contribute to the project.

It should also be noted how ES6+ syntax tends to have slightly better performance (most notably in the V8 engine), and how it can massively reduce effort needed for code maintenance and debugging with the TypeScript Language Service and all the tools that rely on it.

Krinkle renamed this task from Allow ES6 syntax features in gadgets and other site-wide scripts to Update JavaScript syntax checker for gadgets and user-scripts for ES6.Dec 13 2020, 8:34 PM

This task is about the development of the JS syntax checker used by user-supplied gadgets so that use of ES6 syntax is possible there.

For MediaWiki software development in general, there is task T178356, where it will be decided whether to actively require ES6 (discard ES5) browsers.

Note that use of modern web platform features, including ES6 features like WeakMap and Promise, and other APIs like Navigation Timing and Web Audio, are already possible in gadgets today and has no relation to the syntax checker.

This task appears to be incorrectly named. At the moment the JS syntax checker blocks use of ES6 in (i) gadgets, (ii) sitewide JS pages and (iii) user common.js page. It is very much possible for user scripts to use ES6 and import them using user-level common.js page (or from gadgets or sitewide JS pages, for that matter). It's just that ES6 shouldn't directly appear in them. https://en.wikipedia.org/wiki/MediaWiki:Gadget-CommentsInLocalTime.js is an example of a gadget that uses ES6 by loading it from userspace.

In T75714#5752127, @Tgr wrote:

Maybe we could allow modern syntax and transpile it to a legacy version for old browsers (using Babel or something like that)?

(The other option would be transpiling statically, which is I think what the WMF Web team has adopted. For gadgets, that's probably not workable though.)

I think Esbuild could do a fine job here. It's easier to deploy than Babel (one flat binary).

FWIW, it's possible to load modern JS scripts that use ES modules with document.createElement. I've done this for my Commons categories pagination userscript. Of course this subverts the syntax checker, but it works™.

I think people also like to underestimate how much browser support there actually is for ES6. ALL major browsers (this includes mobile) have been shipping with full ESM support for over 2 years now

More productive use of our time, I think, however would be spent on the future of the web and not its past, such as using ES6 natively and raising our browser support accordingly.

This is not only something that would benefit the project itself, but It would open up the doors for hords of enthusiastic and skilled JS/ES devs who might be interested in contributing to this cause, but are effectively unable to do so because of the archaic conditions.

^ Strongly support these sentiments.

FWIW, it's possible to load modern JS scripts that use ES modules with document.createElement. I've done this for my Commons categories pagination userscript. Of course this subverts the syntax checker, but it works™.

You can use mw.loader.load to load non-gadget non-es5 user scripts (from a URL). You just can't use >es5 in ResourceLoader JS (including user and MediaWiki common.js and skin.js files and gadgets).

https://github.com/mck89/peast looks like the only php library available that can validate JS syntax with modes for different ES versions. However, the benchmarks are:

        JSParser::parse (jquery-3.2.1.js.gz)    Peast\Peast::ES6::parse (jquery-3.2.1.js.gz)
 count:          4                                           4
  rate:      0.2/s                                       0.1/s
 total: 19341.17ms                                  40252.85ms
  mean:  4835.29ms                                  10063.21ms
   max:  5367.42ms                                  10615.39ms
stddev:   355.03ms                                    471.19ms
Current memory usage: 34.00 MiB                      34.00 MiB
 Peak memory usage:   34.00 MiB                      58.01 MiB

which is disappointing – 2x slower than JSParser (JSMinPlus, the current validator).

There's also https://github.com/szaboferee/esprimaPHP though I couldn't figure out how to run it, but is anyway not very useful as it doesn't offer a way validate without considering ES7+ syntax.

@Catrope Would it be feasible to enhance the JavaScriptMinifier to always throw on invalid inputs, making it possible to use that as a validator? I see a number of // TODO: This is invalid and should throw in the code.

I also took a look at some node.js tools. There's es-check – a validator based on the widely used acorn parser. Unsystematic metrics (not generated using the Benchmarker class) indicate it takes ~0.5 seconds to check for jquery's ES5 and ES6 compatibility (10x faster!). We could have a feature flag that enables shelling out to es-check, while retaining JSMinPlus as a fallback for the sake of restricted environments.

@Catrope Would it be feasible to enhance the JavaScriptMinifier to always throw on invalid inputs, making it possible to use that as a validator? I see a number of // TODO: This is invalid and should throw in the code.

It would be a lot more work than it seems. Those "this is invalid" comments are about unclosed quoted strings, but that's just the tip of the iceberg. Everything in the minifier aggressively assumes that the input is valid. You could catch some invalid code by throwing when a state machine transition is missing, but I think you'd also have to dramatically expand the size of the state machine, because right now it merges together multiple tokens into the same state that kind of behave the same syntactically but aren't actually interchangeable. For example, it considers break, continue, return and throw to all be basically the same, and they do behave very similarly syntactically, but return is valid outside of a loop whereas break isn't. Worse, it groups else, do, case, try and finally together, but the contexts in which it's legal to use those keywords are very different, and what's allowed to come after them is different too.

A hacky alternative to provide ES6 support in gadgets would be:

  • Introduce a new reserved group which disables validation.
  • Gadgets which mark themselves as es6-only are put in that group - so they are retrieved in a separate request than other gadgets and core/extension-loaded JS.

This means that if any one of these ES6-only gadgets contains a syntax error, all other enabled ES6-only gadgets would also fail, but core/extension-loaded modules and non-ES6 gadgets would work as usual. That seems like a reasonable compromise. Thoughts?

Since MediaWiki 1.36 (or in the case of Wikimedia wikis, rMWb267f7aa9075: resourceloader: Allow modules to mark themselves as ES6-only which was merged last March) ResourceLoader does support ES6. Is anything needed here beyond exposing the es6 ResourceLoader flag to gadgets?

In T75714#7647489, @Tgr wrote:

Since MediaWiki 1.36 (or in the case of Wikimedia wikis, rMWb267f7aa9075: resourceloader: Allow modules to mark themselves as ES6-only which was merged last March) ResourceLoader does support ES6. Is anything needed here beyond exposing the es6 ResourceLoader flag to gadgets?

ResourceLoaderModule->validateScriptFile uses the JSMin+ JSParser to validate modules. Only ResourceLoaderWikiModule uses this validator to validate JS before serving it, ResourceLoaderFileModule does not. JSMin+ has not been updated since 2011 and does not support ES6, so the validator will have to be replaced before ResourceLoaderWikiModule can support ES6.

So can we just disable that validation for ES6 gadgets? I assume it's there to protect other modules when they are concatenated in a single bundle; can ES6 gadgets be a separate resource group? An extra request seems like an acceptable price to pay for allowing the extra functionality. Syntax errors in an ES6 gadget would still break the other ES6 gadgets; that also seems like an acceptable trade-off.

In T75714#7647547, @Tgr wrote:

So can we just disable that validation for ES6 gadgets? I assume it's there to protect other modules when they are concatenated in a single bundle; can ES6 gadgets be a separate resource group? An extra request seems like an acceptable price to pay for allowing the extra functionality. Syntax errors in an ES6 gadget would still break the other ES6 gadgets; that also seems like an acceptable trade-off.

This is what SD0001 suggested just above in T75714#7637508.

Ah, sorry, I missed that comment somehow.

Change 758085 had a related patch set uploaded (by SD0001; author: SD0001):

[mediawiki/core@master] resourceloader: add resource group for ES6 on-wiki scripts

https://gerrit.wikimedia.org/r/758085

Change 758086 had a related patch set uploaded (by SD0001; author: SD0001):

[mediawiki/extensions/Gadgets@master] Add support for ES6 gadgets, but with validation disabled

https://gerrit.wikimedia.org/r/758086

I think Esbuild could do a fine job here. It's easier to deploy than Babel (one flat binary).

I came here to post this thought and saw that @Avindra had already done so.

I think that we should look seriously at using ESBuild (written in Go, very fast, can be distributed as a flat binary, actively developed and understands ES6, etc) for all of our JS parsing / minifying / validating (?) needs in MediaWiki. It seems like the right tool for the current era of web development.

Furthermore, because it's fast and doesn't require adding a Node.js runtime to MW, ESBuild could also pave the way for some future enhancements we have been thinking about, like supporting some kind of build-step for front end code (so that developers can use Typescript, or do tree-shaking), or server-rendering Vue components (we might have to write some plugins to do this).

The only thing ESBuild won't get us is HMR during development. Vite, which is built on top of ESBuild, does provide this capability (and also has first-class Vue support) but then we're re-introducing a Node runtime dependency, so it's a bit of a trade-off.

Anyway, I think coming up with an adequate solution to this problem (support gadget authors who want to use ES6) could also tie into some other related problems in terms of supporting modern JS across our infrastructure.

Change 758085 abandoned by SD0001:

[mediawiki/core@master] resourceloader: add resource group for ES6 on-wiki scripts

Reason:

Core change not needed per above

https://gerrit.wikimedia.org/r/758085

ResourceLoaderModule->validateScriptFile uses the JSMin+ JSParser to validate modules. Only ResourceLoaderWikiModule uses this validator to validate JS before serving it, ResourceLoaderFileModule does not.

Can we remove that and validate on save instead? Validating on asset serving is performance-sensitive, validating on saving is not (well, not so much) so it shouldn't be a problem for MediaWiki to shell out. The validation result could e.g. be written to a derived MCR slot (that way the validation doesn't even have to block the save, although maybe for usability purposes it should), and ResourceLoaderWikiModule would check that slot instead of its normal validation process when it sees the es6 flag.

For surfacing validation information on save, see the parent task, e.g. T76204#3895772.

I agree validating on save would make sense in the long term. Though I'd lean toward treating it as derived data, e.g. saved to a page property reflecting the page's current state, rather than as indefinitely kept content slot programmatically generated and inserted into each revision, with then needing to fix or prevent diff, edit, source, etc. If the parent task is solved, we'd need neither, or perhaps just a constant 1 saved as page prop to catch any cases of the content not having been saved with validation (page moves, content model changes, rollbacks, imports, etc) with runtime fallback to lazy-validation and queuing a refresh job.

Shelling out could work if we e.g. offer some kind of hook to override it in a WMF-specific way, unless we can ship the binary with core such that it is usable across platforms. I doubt we need to shell out, though. I haven't looked into Peast much (mentioend by @SD0001) but that looks to be in quite good shape as a default for use on-save. Given how widely used PHP is, and how widely used JavaScript is, there are bound to be more. If we really can't find a decent parser, we ought to hire someone to connect the available dots and open-source it. It'd be a worthwhile contribution I think. There are plenty of mature and well-maintained language-neutral or near-neutral implementations to tokenize and parse/validate modern JavaScript that could be ported, or generated through something like PEG.

For production, we might also be able to bind a native implementations to PHP, e.g. swc_ecmascript (Rust library), which is used by Deno.

In T75714#7699864, @Tgr wrote:

Can we remove that and validate on save instead?

Something to take into account: If the validator/parser changes in the future, validating when serving (not saving) the contents ensures it will validate everything already written against the new version. This may be a feature in itself: Something valid today may become invalid tomorrow and silently break (a new validator may be more strict). On the other hand, if a security vulnerability is discovered and patched in the validator, it can sanitize already existing code.

This feature can be mimicked by creating a background job that is triggered when running update.php to validate existing pages.

I think validating on save and saving the result as a page property is a pretty good approach.

This may be a feature in itself: Something valid today may become invalid tomorrow and silently break

That sounds highly unlikely as new ES versions are always back-compatible with older ones.

if a security vulnerability is discovered and patched in the validator, it can sanitize already existing code.

The validator does not produce any code. It merely gives a true/false result indicating whether the given code is valid or not.