Page MenuHomePhabricator

PageContentSave does not allow modification of parameters
Closed, InvalidPublic

Description

The PageContentSave hook takes a number of parameters which are references (the content to save, the author, the flags etc), so there is a natural expectation that these can be changed by the hook handler. In reality though, summary, flags and status can be replaced, but new values assigned to the user or the content are discarded. (Replacing the WikiPage, as far as I can tell, never did anything, and I can't imagine a sane use case for doing so.)

This was broken in MediaWiki 1.32 during the Multi-Content-Revisions rewrite by e8632ab0f6264851d2115a2e6338c2074b9a9b8c. I can't remember if this was intentional or not, and even if it was accidental we might want to leave it that way (seems like it could violate a lot of assumptions, e.g. caching based on the user), but at a minimum it should be documented and a replacement suggested.

Event Timeline

Tgr created this task.May 3 2019, 2:55 AM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMay 3 2019, 2:55 AM
Tgr updated the task description. (Show Details)May 3 2019, 3:01 AM
Paladox added a subscriber: Paladox.May 3 2019, 3:04 AM
daniel added a comment.EditedMay 3 2019, 8:13 AM

All the old hooks have all objects passed by reference, simply because before PHP 5, the objects would be copied if not passed by reference. It would be great if we could just change the hooks signatures, but that would break handler functions that declare their parameters to be by-reference, see T193951: Hook handler functions should not require pass-by-reference unless documented in the hook signature. In general, I think hook parameters should be considered read-only, unless explicitly documented otherwise. Being declared as a reference should not be taken to mean that the parameter can be modified.

Replacing the content in this hook would always have caused confusion, because some other hook may have already triggered generation of the canonical parser output. I think AbuseFilter would do that. Note that we currently even specifically force creation of the canonical parser output before this hook is called, due to some obscure interaction with ApiFlowEditHeaderTest; see the comment by Anomie in PageUpdater::saveRevision.

Note that PageContentSave will have to be replaced by an MCR-aware hook in the near future anyway.

Doing some archaeology,

Tgr closed this task as Invalid.May 3 2019, 8:08 PM

It would be great if we could just change the hooks signatures, but that would break handler functions that declare their parameters to be by-reference, see T193951: Hook handler functions should not require pass-by-reference unless documented in the hook signature.

See comment there.

Only the $summary parameter in PageContentSave is documented as being passed by reference.

Right, so this is just discrepancy betweek hooks.txt and the wiki page. I didn't think to check there. hooks.txt is more authoritative so all that needs to be done is to fix the wiki page, then.

Tgr added a comment.May 3 2019, 8:09 PM

...or the infobox part of the wiki page, anyway. Thanks for digging through the history.

@Anomie I don't remember what my intention was last week, much less 15 years ago!

It looks like the change was made to preserve pass-by-reference, which had changed between PHP 4 and PHP 5.

I think for at least some hooks, pass-by-reference was needed to override default behaviour.

daniel added a comment.May 7 2019, 5:02 PM

Right, so this is just discrepancy betweek hooks.txt and the wiki page. I didn't think to check there. hooks.txt is more authoritative so all that needs to be done is to fix the wiki page, then.

Right, because it's ok for the handler function to declare by-value, but the run() call still getting a reference. We need to keep that until all handlers have been converted to not requiring a reference to be passed (or implement the hack described in T193951).

Tgr added a comment.May 8 2019, 12:23 AM

Still would be nice to document what the replacement should be (if any), though. I guess intentionally misidentifying the user is not a plausible use case, but if someone wants to modify the content before it gets saved, what's their least bad option post the MCR rewrite? Should there be a new hook that runs pre-prepareContent? (That's pre-PST, but probably most users of the hook wouldn't care much.) Or one that runs afterwards and can reset the DerivedPageDataUpdater if the content changes?

if someone wants to modify the content before it gets saved, what's their least bad option post the MCR rewrite?

I'm dubious about the use case, to be honest. But in any case, I'd want to wait for T212482 before we introduce new hooks (unless there is an urgent need).

Hi folks,

if I may chime in here, it was my message to the wikitech-l mailing list that originally prompted Tgr to open this issue. I am the author of the LinkTitles extension that is not compatible with MediaWiki 1.32 because it "exploits" the fact that the Content object that was passed by the PageContentSave hook could be replaced by a new Content object in versions prior to 1.32. The extension looks for titles of existing pages that occur in the content of the page being saved, and adds internal links ([[...]]) to them.

There are only a dozen extensions that make use of this hook, and it seems that LinkTitles is the only one that modifies the page content. So, with regard to the doubtful use cases for modifying the content of a page being saved, there may be only one?

In any case I would appreciate very much if someone with deep knowledge of MediaWiki's internal workings could point me to a hook that would allow for page content modification. I don't know how many Wikis have the LinkTitles extension installed (and of course it's not Wikipedia), but I do get feedback from administrators every now and then (cf. also the discussion page, plus e-mails).

If page content modification is not possible when a page is being saved, the LinkTitles extension is also able to add links to the rendered page, but that may delay page rendering quite a bit (and thus impact user experience). Another alternative is a cron job, but this may not be an option for all administrators of MediaWiki instances (i.e. those in hosted environments).

Thanks very much!

daniel added a comment.May 9 2019, 9:16 AM

Hi @bovender! Thank you for providing some context for this request.

What you are trying to do sounds like a custom pre-safe transform (PST) for wikitext. This transformation is done in Parser::preSaveTransform(). There is currently no hook point there, but it would be simple enough to add one.

Another option i see is using the EditPage::attemptSave hook, but I'm not sure it has a good/reliable way to modify the content either.

Tgr added a comment.May 10 2019, 2:29 AM

There's Content::preSaveTransform() but you'd have to pull some nasty hacks to use that as a hook.
EditPage is not invoked on automated edits (say the NewUserMessage extension posting to a user talk page), which might or might not be preferable.

Another option is to do the transformation during parsing. That's not ideal in terms of performance but well-supported; you could use ParserBeforeStrip for example. The extra links would not show up in source edit view, which again might or might not be preferable.