Page MenuHomePhabricator

Unexpected PST effect when saving edits to .js pages that were formerly <nowiki>
Closed, ResolvedPublic


For various reasons, when saving edits to .js and .css content-model pages, the wikitext parser is also invoked. Its HTML rendering is discarded, but there are three notable features that do work:

  1. Things that look like outgoing links are recorded in the link tables. E.g. // [[Sandbox]] on MediaWiki:Foo.js will result in the page being listed at "WhatLinksHere" for Sandbox. This feature is used for categorising script pages and for cross-wiki usage tracking of javascript (using Commons and GlobalUsage for image links) .
  2. Use of {{subst: something}} or ~~~~ will result in it substituting code in the middle of the JavaScript whilst saving the actual revision. This is basically never intended but is an inevitable side-effect of feature 1 right now, which leads to feature 3:
  3. Tags like // <nowiki> or // <source> can be used to disable these features.

This is has known limitations and issues but those are not the subject of this task (T10761, T61616, T18683).

Subject of this task is that the following used to be unaffected by PST (post-save transformations):

// <nowiki>
var x = "~~~~";
var y = "{{subst:Main Page}}";
mw.whatever(x, y);

However I've noticed that as of about 1 year ago, when I make semi-automated bot edits to .js pages across wikis, that sometimes replacing mw.whatever with mw.somethingElse that my edit results in huge side effects due to <nowiki> no longer working.

In both of these cases, code like the following existed for 8 years on the wiki without issue:


				$('#wpTextbox1').val('{{subst:الگوی خالی|<strong class="error">از دستور جا: استفاده کنید</strong>}}{{حق تکثیر تصویر نامعلوم۲|19 January 2019|{{جلالی|19|January|2019|پیوند=نه}}}}

But then a minor change elsewhere in the script (by yours truly) made it expand because I made the edit after a change in the software that made <nowiki> ineffective if not closed. As such, it has become unsafe for me to run this script eventhough I very carefully locally review the diff (pre-save), because the change happens post-save.

Event Timeline

I'm currently stuck with several choices:

  1. Run a find/replace script without limitations which breaks hundreds of scripts (been there, done that, took two days to revert; breakage was due to things like ~~~ or {{subst: expanding in the middle of a .js page after my edit because it had previously been protected by <nowiki> butno more per task description).
  2. Write a basic check to see if the page contains the same number of open tags as close tags for nowiki, source, pre and syntaxhighlight. Then skip these during the automatic find-replacements. And instead fix the skipped ones by hand.
  3. Like 2, but instead of fixing the skipped pages by hand, review them quickly to manually add </nowiki>, </pre>, </source> or </syntaxhighligiht> to the end of the page (or none of them if it turns out to be a false negative in my check. I have already found several because it is valid to have a page wrapped in one of these and then mention another one's open tag in a JS comment). And then re-run the script over the now-valid skipped pages.

I've written the basic check for option 2, but several core deprecations are blocked on being finalized because doing the skipped ones by hand is a lot of work for things that are popular but easy to find-replace.

A handful of migrations I've completed manaully because the number of uses under a 100. But there are now several building up that I'd rather not do the manual work for especially if there is any hope of fixing this regression.

These hundreds of user pages have become landmines. Both for me as well as their original owner because saving any kind of edit them, breaks them completely.


  1. T235457: Combine the user.tokens and user.options modules

Why aren't we disabling PST for non-wikitext contentmodels? That strikes me as the *correct* answer here.

In Preprocessor_Hash.php look for $xmlishAllowMissingEndTag -- nowiki is not included in that group, which means that a missing end tag causes the open tag to be treated as literal text.

That behavior was changed by @matmarex in 674e8388cba9305223ef52f733e726c71c755778 (Feb 2016), which caused a regression (T125754) the first time it was merged.

I don't have a strong opinion here, although T240280: Parsoid corruption (loss of content, removed newlines) when editing pages with fostered content (often caused by unclosed table inside a list item `:{|`) is a related issue which inclines me to think that "ignoring start syntax if the end syntax is not found" is probably the right behavior here (aka, we should do that with tables too and make it a consistent wikitext design, if other exceptions are found).

The right solution to this particular problem seems to be to disable PST on non-wikitext content models, rather than to fiddle further with wikitext extension tag semantics.

Alternatively, you might add a "PST safe mode" which still ran the PST to extract categories/files/what-links-here info, but forbid (or reverted) any actual change to the text saved. That is, the "transform" was always identity, on contentmodels that are not wikitext.

Proposal: since we are trying to be conservative about deploys & changing behavior...

  • Add some code to detect the case where contentmodel != wikitext and $postPST != $prePST.
    • Add a tracking category in this case
    • Also, if savepst=true in the save-page API, reset $postPST to $prePST (ie, undo the changes made by the PST, while preserving the link table / category / etc updates)

This would unblock @Krinkle in his work *safely* updating old gadget JS (without inadvertently triggering the PST) because his tools would pass safepst=true.

And then after some suitable interval of either the tracking category staying empty or editors not complaining that their pages are being inappropriately added to the tracking category, safepst=true would become the default.

After some investigation (those who know this code better please correct me) it seems like the "scan for links" is done by DerivedDataUpdater::getCanonicalParserOutput() and is completely unrelated to whether or not PST transformations are done on the content (which is done by CssContent::preSaveTransform() and JsContent::preSaveTransform() etc). And in fact there's already a preSaveTransform option in ParserOptions which disables "most" of the change-the-page transformations -- it still strips nulls and normalizes line endings, but I suppose that's acceptable.

So at this point I think what would help @Krinkle is if there were a way to set the preSaveTransform flag to false in ParserOptions when using the edit API to save a page. (This code is such a maze, I wouldn't be totally surprised to find that the possibility exists already.)

Looks like the easiest way to hack things is to add a User preference to disable PST on JS/CSS.

Whenever we do the PST we create a ParserOptions with ParserOptions::newFromUserAndLang(), so it would be straightforward to set the preSaveTransform flag in the ParserOptions to false based on the user (or the language, I guess).

@Krinkle could have a bot account where he'd set a mostly-hidden "no PST on CSS/JS" flag.

The way EditPage is set up, it seems harder to smuggle a request flag into DerivedDataUpdater otherwise, although it's possible you could abuse the stashedit functionality; that is, as long as the stashed edit has PST disabled, the various EditPage/PageUpdater/DerivedDataUpdater twisty-little passages won't try to re-PST it, so you'd only need to hack the edit-stashing path, not *all* the possible paths through EditPage you could take (eg, previews, diffs, etc).

I was wondering to what degree most platform-type folks might find "obscure per-User flags" distasteful; @Krinkle replied, "well, it'd be an internal param not in a stable release I guess. Just for a couple weeks."

It's already a ParserOutput option -- the issue is that there is *no* call to ParserOutput::setPreSaveTransform anywhere in the core corebase, (It seems that the Content-subclass-specific preSaveTransform() hooks are pretty new to core, so it's possible the ParserOption was in use before we came up w/ better ways to disable PST on specific content types.)

UPDATE: yeah, the Translate extension calls ParserOptions::setPreSaveTransform(). The ParserOptions flag was added in 2011, in 2a30aa1d256d4cc72637f94e733a9031f47f1be4. Apparently the idea was that you could then disable PST in an extension by hooking ArticlePrepareTextForEdit, which the Translate extension appears to still do:

And in fact that's the *only* user of the ArticlePrepareTextForEdit hook, and the *only* user of ParserOptions::setPreSaveTransform.

Change 585811 had a related patch set uploaded (by C. Scott Ananian; owner: C. Scott Ananian):
[mediawiki/core@master] WIP: Hidden user preference for bots to disable PST

Making it a user preference seems dangerous… if you manage to somehow accidentally set it, then e.g. ~~~~ will no longer expand to your signature? I don't ever want to debug that.

In a perfect world we would disable PST for non-wikitext pages, as you originally suggested (this doesn't prevent link extraction etc.) – however, that would break use-cases like

Adding an API parameter for it seems to be the least harmful, but that would make it easier for malicious users to set up pages that change when you null-edit them.

Maybe we should just change the behavior of <nowiki> (and <pre>) tags be like <includeonly>/<noinclude>/<onlyinclude>, and not require a closing tag, basically restoring the old behavior?

I think changing existing wikitext behavior is even more dangerous than the above options! Especially extension tag behavior (which is how <pre> is implemented).

As I understand @Krinkle's proposal, the idea is to add this as a user preference with no UI to turn it on. Krinkle can use super-admin powers to give himself (or his bot) preference just for a limited time to run his JS migration tools, then we can remove the preference from the codebase. Ideally all done before LTS and this never makes it into a "release".

Orthogonally, I'd like to add a tracking category to any JS/CSS pages which "change" when the PST is run. In theory even if we don't implement the super-bot-power this would help @Krinkle find pages which his bot had broken. But in practice this sets the stage for eventual cleanup of JS/CSS pages which use the PST transform and (in say a year) even possibly turning it off.

It occurs to me that a API option to *enable* PST for JS/CSS, with PST off by default, would be 'safer' and would probably enable the sorts of tricks being played by Template:Js. But really that particular function could be done by a toolserver script or javascript snippet or editor plugin just as well.

Making it a user preference seems dangerous… if you manage to somehow accidentally set it, then e.g. ~~~~ will no longer expand to your signature? I don't ever want to debug that.

I think as long as we limited it to CSS/JS content models, this is acceptable. While it is sometimes intentionally used, those who turn this pref on via the API would disable it and therefore be able to safely edit any page without sleeping PST syntax waking up.

Restoring <nowiki> to work the way it used to would obviously have my support as well, I don't know how trivial that is. Given it has been a few years, I can imagine that may have genuinely started to be unintentionally relied upon in the other direction (example).

As I understand @Krinkle's proposal, the idea is to add this as a user preference with no UI to turn it on. Krinkle can use super-admin powers to give himself (or his bot) preference just for a limited time to run his JS migration tools, then we can remove the preference from the codebase. Ideally all done before LTS and this never makes it into a "release".

I don't think this is a one-time thing. AFAIK, @Krinkle has been doing JS edits regularly and will need to do this in the future as well and I imagine the same PST considerations will apply then. But, I think user pref without UI support is still the way to go about it.

We should nevertheless see what is involved in getting rid of PST from non-wikiext content pages. If linter or tracking categories can help nudge this along, we should use those tools. But, anyway, this part is not very high priority .. so, I suspect we will be living this for a while.

Here is a rare example of PST being intentionally abused on JS pages (I edited this manually rather than with my tool, and reviewed the PST effect via "Show changes" before publishing).

ptwiki diff 58144397

- 	this.version = /*{{subst:Autossubstituição/Estampa com data e hora|js|.*/ '2019-11-28 01:18:04 (UTC)' /*}}.*/;
+ 	this.version = /*{{subst:Autossubstituição/Estampa com data e hora|js|.*/ '2020-04-28 22:00:34 (UTC)' /*}}.*/;

Doesnt't matter for the task here as my bot edits don't need to bump. Just thought I'd share :)

Tue to the fact that many of these pages have been unmodified for years, for a number of them my edit to the page has been the first re-parse of them since the breaking change to <nowiki> in 2016-2017.

As such, they have the unfortunate side effect if not only risking unintended substitution of templates or signatures (I'm not worried about those for mis-matche'ed nowiki pages, I edit them manually and review the diff before saving), but they also have the risk of categorising the page in ways it was not categorised before.

For example, here as user has contacted me that my edit "caused" their user script to be categorised for speedy deletion because it mentions {{Deleteme}}. The script has <nowiki> at the top to opt-out from the linking "features", but this stopped working in 2016, and my edit was the first re-parse since then.

This is just one example, but I've been receiving talk page messages like this on random wikis throughout the past six months. I'm able to explain to them what actually happens, and how to resolve it, but I can only imagine the frustration other users must be facing who are running into this in isolation. Especially when they find out that reverting their edit does not help. It's not unlikely that the script in question is a mutation of a mutation of something they copy-pasted, so string-escaping the curly braces or balancing <nowiki> is not per se something they could come up with.

For my own purposes, the no-cssjs-pst preference discussed above would suffice in unblocking my fixes for javascript deprecation and in turn the removal of those deprecated features from core.

However, for our end-user's sake, I am once again thinking we should consider restoring this behaviour. How likely and how common is it that users will have started to rely on the opposite behaviour of typing <nowiki> raw in the wikitext with the intent of it appearing as literal text (as it does today, if unclosed). And could we lint and phase that out, instead? Possibly by jump starting it for JS-content models without linting/deprecation, as it seems near impossible to be desirable there as those don't render visually as wikitext.

Change 585811 merged by jenkins-bot:
[mediawiki/core@master] content: Add hidden preference to disable PST on CSS/JS content

matmarex assigned this task to cscott.