Page MenuHomePhabricator

Enforce JavaScript syntax check when editing user/site script pages
Open, LowPublic

Description

Over at enwiki WP:VPT, we get an awful lot of people complaining that one gadget or other has stopped working. Invariably it is because one of their personal JavaScript subpages contains invalid JavaScript (e.g. they mistakenly put CSS in their JS subpage, or tried to comment things out with <!-- -->, or other such nonsense). This means that anything appended after the invalid JS by ResourceLoader, including some site gadgets, will not be executed by the browser.

Similar to what is done with Lua, a simple syntax check/lint should be executed when a personal JavaScript page is saved. If there are syntax errors, the user gets a warning, letting them know that if they proceed, some parts of the site may cease to work.

There are other possible resolutions to the underlying problem: for example, if ResourceLoader detects invalid JavaScript in a file it is about to process, it could reject that file. But I think informing and educating the user is a better idea here.

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

In general this is good idea, but a word of warning: sometime css/js pages are "abused" for other purposes: For example there are huggle.css pages for saving user configuration of huggle tool, and this is not a valid CSS (this is done to prevent other users to change configuration of the user, as only the user and sysops are allowed to edit personal css).
I'm not aware to such abuse in JS pages, but it may be possible.

I'm not aware of JS being abused for this. I always use CSS for this type of personal data storage, exactly because malformed CSS usually has fewer ill-effects than malformed JS.

I would suggest that syntax errors only be displayed when editing via index.php, not via the API, and that the presence of an error wouldn't prevent the user from saving the page, to allow for the possibility of pages is being "abused" with non-JS content.

@Rillke runs a bot on commons that warns people when they have saved incorrect JS. We could consider adapting it a bit and running it on a wider scale for everyone once in a while and send them messages if they have syntax errors.

Note that we already have a built-in capability in MediaWiki PHP to validate javascript code (wgResourceLoaderValidateJS). Currently this is only used by ResourceLoader when loading a wiki-page module (replacing the script with a log write reporting the relevant syntax error).

I'd recommend expanding its scope to be an on-save handler for pages handled by javascript content type and reject them accordingly. It seems ContentHandler already has a validation infrastructure in place (Content::isValid). E.g. Editing Schema pages on meta.wikipedia.org with invalid syntax results in the save being prevented with an "Invalid JSON" error.

In general this is good idea, but a word of warning: sometime css/js pages are "abused" for other purposes: For example there are huggle.css pages for saving user configuration of huggle tool, and this is not a valid CSS (this is done to prevent other users to change configuration of the user, as only the user and sysops are allowed to edit personal css).
I'm not aware to such abuse in JS pages, but it may be possible.

I'm worried about this too.

In general this is good idea, but a word of warning: sometime css/js pages are "abused" for other purposes: For example there are huggle.css pages for saving user configuration of huggle tool, and this is not a valid CSS (this is done to prevent other users to change configuration of the user, as only the user and sysops are allowed to edit personal css).
I'm not aware to such abuse in JS pages, but it may be possible.

I'm worried about this too.

That's why I suggest implementing it as a warning to begin with (similar to what Scribunto used to do, and similar to the AbuseFilter warnings, which don't prevent the user actually saving the content). And to avoid breaking any automated tools that may be saving their data in .js pages, it might be good to only trigger the warning when saving via index.php?action=edit and not when using the API.

In general this is good idea, but a word of warning: sometime css/js pages are "abused" for other purposes: For example there are huggle.css pages for saving user configuration of huggle tool, and this is not a valid CSS (this is done to prevent other users to change configuration of the user, as only the user and sysops are allowed to edit personal css).
I'm not aware to such abuse in JS pages, but it may be possible.

I don't think those are valid use cases, even for .css pages. Huggle should use the userjs- preference keys instead of a wiki page.

In T76204#799901, @TTO wrote:

That's why I suggest implementing it as a warning to begin with (similar to what Scribunto used to do, and similar to the AbuseFilter warnings, which don't prevent the user actually saving the content). And to avoid breaking any automated tools that may be saving their data in .js pages, it might be good to only trigger the warning when saving via index.php?action=edit and not when using the API.

That sounds like a good compromise.

I don't think those are valid use cases, even for .css pages. Huggle should use the userjs- preference keys instead of a wiki page.

I'm pretty sure Huggle's use of user .css pages significantly pre-dates the availability of the API preference functionality. Same with Twinkle's twinkleoptions.js (although that is valid JavaScript).

This would be useful, but must be a warning and not enforced, for the reasons already mentioned.

Change 176853 had a related patch set uploaded (by Krinkle):
content: Add validation for JavaScriptContent

https://gerrit.wikimedia.org/r/176853

Patch-For-Review

@TTO: Per @Legoktm, please provide valid uses for allowing invalid javascript to be saved.

If a tool or bot is using .js pages to store non-javascript content, please give examples of such tools. We can coordinate with them to use something more appropriate as that is obviously inappropriate. We can simply wait with this feature until those are migrated. No need to implement a warning mechanism first.

Having it on a .js subpage doesn't provide any benefit from what I can see. The only use case I can imagine is edit protection (users can only edit their own css/js subpages), which can be achieved by other means. Or, as easy stop-gap migration, simply wrap it in a comment. Or use JSON and add "var json =" on top.

Or better yet, as mentioned, use the preferences API.

It would be interesting to see statistics on how many .js pages are being "abused" with invalid content.

Invalid user JS subpages would fall into two categories: unintentionally invalid content, e.g. JS subpages using <!-- --> comments or #REDIRECT statements; and intentionally invalid content, like preference settings stored in INI or XML formats. It would probably require manual discrimination to sort these two categories.

However, on enwiki alone, there are about 57500 user JS subpages. So any statistics-gathering would be difficult.

The only use case I can imagine is edit protection (users can only edit their own css/js subpages), which can be achieved by other means.

Which other means? I use [[User:This, that and the other/FtCG current version.css]] and [[User:This, that and the other/FtCG local wiki data files.css]] for exactly this purpose, and don't see another way around it. (I admit I could have used valid CSS by encasing the whole content of those pages in /* */, but that's not the point - I need those pages to be editable by no-one else besides myself and any admins answering editprotected requests should I ever leave the project.)

@Jackmcbarn, @TTO: In case of software like gadgets or tools, it should probably be stored in the relevant git repository instead of on-wiki (which, if you want the Tool labs tool to be kept alive after you leave, you should have regardless.). Or, if you don't want it centrally maintained; a MediaWiki-namespace page would work as well. Or a page in your user namespace that is protected by sysops, or an AbuseFilter rule whitelisting you as editor.

There are many ways. Some elegant, some hacky. And there's always the option to keep it on .js, but it'll have to be valid JS (or, to ease parsing, wrap it in something that only involves the first and last line).

There doesn't need to be a way to save invalid JavaScript. We just need to come up with a viable migration strategy. I assume the warning-machanism you proposed would be temporary.

Krinkle triaged this task as Medium priority.Dec 2 2014, 4:29 AM

Doesn't the built-in JavaScript code editor already warn about invalid syntax?

Doesn't the built-in JavaScript code editor already warn about invalid syntax?

Yes. Still, some people manage to save invalid scripts into their <skin>.js or common.js and get warned by CommonsMaintenanceBot (CMB). Have a look at its report: https://commons.wikimedia.org/wiki/Commons:User_scripts/users_namespace/reports
CMB was launched simultaneously with the editor-warning-feature. Especially the Esprima-Errors are of interest.

Or use JSON and add "var json =" on top.

Could we have .json pages in user namespace for preferences that do not have to be served with each page load and that does not require the scary warning about possibly evil scripts that you get when editing JS pages?
I believe that would be a cleaner option.

I generated a list of all user JS subpages on the entire WMF cluster (all 222592 of them) and checked the first 20000 (from aawiki down to commonswiki) to see if they contain valid JavaScript.

Valid: 18300
Invalid: 1189
Failed: 511 [my script choked on be-x-old, cbk-zam, etc]
Total: 20000

See P117 for full output.

That's at least 6% containing invalid content, which is a significant proportion. Many of these are #REDIRECT pages, caused by T36930 and its see-alsos.

I came across a significant "abuse" of user JS pages: opt-in to the xtools edit counter e.g. [[ast:User:Savh/EditCounterOptIn.js]], [[az:User:Araz Yaquboglu/EditCounterOptIn.js]]. The relevant tool on Tool Labs is currently down, but IIRC, the tool says you can create this page with "any content" if you want to opt in. I'm adding the tool's maintainer to this task so he can weigh in.

I generated a list of all user JS subpages on the entire WMF cluster (all 222592 of them) and checked the first 20000

Great! You checked for ECMA script 3 validity, I suppose? Interestingly, when pressing "edit" on :c:User:Rillke/SettingsSwitch-EditDropdown.js Code-Editor does not give any warning. We should agree on a common standard before running into action ;-)

I came across a significant "abuse" of user JS pages: opt-in to the xtools edit counter

Indeed. I already edited pages that I could edit so they ask to put something valid in and submitted a pull request (where I received a boilerplate response by its maintainer only).

I generated a list of all user JS subpages on the entire WMF cluster (all 222592 of them) and checked the first 20000

Great! You checked for ECMA script 3 validity, I suppose? Interestingly, when pressing "edit" on :c:User:Rillke/SettingsSwitch-EditDropdown.js Code-Editor does not give any warning. We should agree on a common standard before running into action ;-)

I used the jsminplus that comes with MediaWiki, which is also what Krinkle's patch uses. You shouldn't be using default as an unquoted property name in your JSON; that is what is causing that script to show up as invalid. Perhaps that is overzealous, but your script is technically invalid (although most browsers will overlook this invalidity).

I came across a significant "abuse" of user JS pages: opt-in to the xtools edit counter

Indeed. I already edited pages that I could edit so they ask to put something valid in and submitted a pull request (where I received a boilerplate response by its maintainer only).

To the edit counter's defense, this opt-in method has been in use since 2010. Also the repo was out of date, it isn't anymore. But more to the point, since the intro to OAuth, I wanted to move to a new opt-in system that removes the use of wiki pages. Unfortunately a lack of time has prevented me from doing so. :-( I may be recruiting a new maintainer soon though to help speed things along.

In T76204#801112, @TTO wrote:

I used the jsminplus that comes with MediaWiki, which is also what Krinkle's patch uses.
You shouldn't be using default as an unquoted property name in your JSON; that is what is causing that script to show up as invalid.

Yes, that's the reason, exactly. It demonstrates that your installation of jsminplus is checking for ECMA script 3 validity.

Perhaps that is overzealous, but your script is technically invalid (although most browsers will overlook this invalidity).

Nope. It is perfectly valid ECMA script 5, which is implemented by all modern browsers. C.f. ECMA Script 5 section 11.1.5 vs. ECMA Script 3 pp.41f., section 11.1.5 and note that the third version requires an Identifier as PropertyName (among StringLiterals and NumericLiterals) while the fifth version only requires an IdentifierName (among StringLiterals and NumericLiterals). An Identifier excludes reserved words, an IdentifierName doesn't.

The point here is not to prove validity of my script but to have consistency between the errors shown by the client side CodeEditor and those that the server side parser will throw. Unless this is proven, I will -1 the patch.

In T76204#801112, @TTO wrote:

I used the jsminplus that comes with MediaWiki, which is also what Krinkle's patch uses. You shouldn't be using default as an unquoted property name in your JSON; that is what is causing that script to show up as invalid. Perhaps that is overzealous, but your script is technically invalid (although most browsers will overlook this invalidity).

Using an unquoted property by any name in JSON is invalid. Property names in JSON must always be quoted. To my knowledge, no JSON parser in any programming language or browser tolerates that.

I assume you're referring to an unquoted property name in an object literal in javascript. In which case the standard for ECMAScript 3 (which MediaWiki supports in the form of IE8; and previously IE6, IE7, Safari 5 and Firefox 3.6) says that so-called Reserved Words (e.g. language syntax keywords) are not allowed in as unquoted key in object literal { foo: 1 } or in dot notation (obj.foo). This mistake was corrected in ECMAScript 5. And modern web browsers now ship with an engine that was made ECMAScript 5 compliant.

jsminplus was, and still is, an ECMAScript 3 parser. And as long as MediaWiki unconditionally executes scripts (e.g. gadgets and Common.js) in any browser with an ES3 javascript engine (e.g. IE8) scripts should be written to not fatally crash in those browsers, i.e. the script should be parseable.

I'm not sure whether you're referring to the parser or jshint module in CodeEditor. In case of JSHint, JSHint has dropped support for ES3 browsers. It still has an option to emit warnings for ES3 incompatibilities, however, by using /*jshint es3:true */ (which is what MediaWiki core does).

In the case of user scripts, I can appreciate you not wanting to be forced to have scripts compatible with browsers you don't personally use. Though note that many scripts in the User space aren't very "personal" at all (e.g. it's unfortunately very common practice for wikis to load user scripts from the MediaWiki namespace; or loaded directly by other users). And note that this is merely about parser compatibility, you're still free to unconditionally use methods only available in your browser.

Could we have .json pages in user namespace for preferences that do not have to be served with each page load and that does not require the scary warning about possibly evil scripts that you get when editing JS pages?
I believe that would be a cleaner option.

I like this. We can give .json pages in the User namespace the same protection as .js and .css. While I'd prefer to avoid adding more user rights, we'd probably want to introduce edituserjson (to complement edituserjs and editusercss).

Could we have .json pages in user namespace for preferences that do not have to be served with each page load and that does not require the scary warning about possibly evil scripts that you get when editing JS pages?
I believe that would be a cleaner option.

I like this. We can give .json pages in the User namespace the same protection as .js and .css. While I'd prefer to avoid adding more user rights, we'd probably want to introduce edituserjson (to complement edituserjs and editusercss).

Seems like a good idea to me. Want to code it up?

By "JSON" I clearly meant "object literal". I think my lack of knowledge about the finer points of JavaScript has been clearly demonstrated :)

Someone said somewhere that we should get a global bot to go through everyone's EditCounterOptin.js pages and comment them out (wrap them in /* */). I'm not sure if this is worth doing - after all, these pages are never actually loaded as JavaScript content - but I think I should mention it here for completeness.

Seems like a good idea to me. Want to code it up?

Let's move the .json stuff to a different task. This one is about warning users of invalid syntax in their .js subpages.

Could we have .json pages in user namespace for preferences that do not have to be served with each page load and that does not require the scary warning about possibly evil scripts that you get when editing JS pages?
I believe that would be a cleaner option.

I like this. We can give .json pages in the User namespace the same protection as .js and .css. While I'd prefer to avoid adding more user rights, we'd probably want to introduce edituserjson (to complement edituserjs and editusercss).

Seems like a good idea to me. Want to code it up?

Filed as T76554. Free for anyone to take. I might do it as some point.

See also T75714 for the question which version of Ecmascript should be used.

There is another use case for invalid syntax in JS pages not mentioned yet: If you want an administrator to delete your script you usually just add a deletion template, which is invalid JS syntax, but a valid use case.

There is another use case for invalid syntax in JS pages not mentioned yet: If you want an administrator to delete your script you usually just add a deletion template, which is invalid JS syntax, but a valid use case.

Users can place these deletion templates in comments

// {{deletion template}}

just works fine (template links, categories).

If you want an administrator to delete your script you usually just add a deletion template

Users can place these deletion templates in comments

// {{deletion template}}

Yup. I believe this is already a convention more-or-less, as people also use wiki syntax on pages that are not requested for deletion. E.g. Commons and en.wikipedia use a fair number of // [[Category: ...]] on bottom of user-namespace script pages.

I don't think that most users care about valid JS syntax in scripts they just want to have deleted anyway. Can somebody who has sysop rights on a big wiki look through deleted user scripts to see how many deletion requests were correctly commented out?

I don't think that most users care about valid JS syntax in scripts they just want to have deleted anyway. Can somebody who has sysop rights on a big wiki look through deleted user scripts to see how many deletion requests were correctly commented out?

I estimate nearly zero because no one cared before. The more interesting question is how many users will get it right and how many wikis will add instructions to MediaWiki:Javascript-error-syntax about how to deal with most common errors.

NB: The nominate-for-deletion gadget at Commons automatically wraps the deletion templates in comments of the language the nominated page is in.

Ping. What is there left to do here? Is a warning in Tech/News sufficient?

This takes care of invalid scripts when saving, but... what about existing errors that block JS? Is it possible to wrap all user JS in a try..except and throw a visible exception when an error in user JS occurs? Should be trivial to do in ResourceLoader.

Wrapping user scripts in try {...} catch (e) {} will cause the same bug as T71924.

Wrapping a user module in a try..catch wouldn't solve much. It is already executed in its own script tag. Its exceptions don't affect other code.

The only external code it affects is the module registry status entry and developer console, which should know about the error. If modules explicitly depend on user (server-side) they shouldn't do that if they want to tolerate failure. If modules wait for user to finish loading and want to tolerate failure (client-side) they should handle the fail() or always() methods of the loader promise.

Either way, the error handling is on the side of the code that uses the user module, it shouldn't be handled inside the user module itself.

The bug that exceptions in the user module don't cause mw.loader to know about the error is solved as of T107399 (and T106736).

Krinkle lowered the priority of this task from Medium to Low.Feb 9 2016, 7:17 PM

Change 176853 abandoned by Krinkle:
content: Add validation for JavaScriptContent

Reason:
Can be re-opened after the task is unblocked. Currently blocked on migrating invalid use of .js pages to other user sub pages, which in turn is blocked on making .json sub pages protected like .js subpages are.

This migration in turn is partly blocked on other software programs being updated, because often these subpage are not arbitrary pages made by individual reasons, but for use by another program, such as opt-in edit counters or Huggle preferences. Which would either need the other software to change its syntax, or change its page name.

https://gerrit.wikimedia.org/r/176853

Krinkle renamed this task from Run a basic JavaScript syntax check when user/site JavaScript pages are saved to Enforce JavaScript syntax check when editing user/site script pages.Dec 13 2020, 8:37 PM

The JSMinplus validator does not even support ES6 ( T75714). Even when it does, it's unlikely it would be constantly updated to support the latest ES variant of the day.

I think it would be better to perform the validation client-side, either as part of CodeEditor/Ace or by using acorn, and have invalid code only trigger a warning rather than block the save entirely. This eliminates the blocker noted above, allows use of latest ES versions for personal scripts without creating a burden on MW developers to maintain validators.