Page MenuHomePhabricator

Add a hook, called before a page edit is saved, offering the PSTed content of the edit
Open, Needs TriagePublic

Description

Currently, EditPage has the 'EditFilterMergedContent' hook, which provides the new *text* of the edit, along with other data such as the summary. While that's OK for some consumers, others may need the *PSTed* text of the page. Right now I'm thinking of AbuseFilter (some variables will return the links contained in the new text), and especially SpamBlacklist. SB will effectively PST all edits, because it always needs the added links to filter them.

Fortunately, we have a Parser cache and the new text of the edit will only be parsed once. However, I don't think that we should rely on this in the long term. It's also a little confusing, if you look e.g. at the performance flame graphs (it could seem that SpamBlacklist is very time-consuming).

Hence, I propose that EditPage[1] should run a hook after having PSTed the text, passing the result in. This would deny the need to run PST in extensions when needed. Also, such a hook should be executed only if the page is going to be saved (i.e., not if the PSTed content is the same as before), so that extensions don't have to check for that.

The EditFilterMergedContent hook should remain in place, for consumers that do not need the PSTed text.

[1] - Or a related class, as long as it's guaranteed that the hook will only ever be called on page edits.

Event Timeline

Anomie subscribed.

EditPage doesn't PST as part of the save. The PST is currently done via EditPage::internalAttemptSave()WikiPage::doEditContent()PageUpdater::saveRevision()DerivedPageDataUpdater::prepareContent(). I don't know whether that code path is only called for page edits.

On the other hand, I've long advocated that the architecture of PageUpdater/DerivedPageDataUpdater is a mess that needs to be better refactored into components as diagrammed in File:Edit and revision rendering data flow diagram.svg (where PageUpdater/DerivedPageDataUpdater currently implements only the red arrow path through that diagram).

and especially SpamBlacklist. SB will effectively PST all edits, because it always needs the added links to filter them.

SpamBlacklist always parses the content too, it doesn't only PST it. I don't know what AbuseFilter does. The saving of the edit doesn't need to actually parse at all, although it currently might. The actual "derived page data" part of DerivedPageDataUpdater does need to, but that happens (or can happen) post-save, in a DeferredUpdate and/or a job like LinksUpdate.

Terminology:

  • PST ("pre-save transform") converts the Content submitted by the user into another Content, with template subst: applied and ~~~~ converted to the user's signature, and a few other things.
  • A parse converts a Content object into a ParserOutput. Which is where SpamBlacklist gets the links, etc.

EditPage doesn't PST as part of the save. The PST is currently done via EditPage::internalAttemptSave()WikiPage::doEditContent()PageUpdater::saveRevision()DerivedPageDataUpdater::prepareContent(). I don't know whether that code path is only called for page edits.

Yeah, that's the reason for my foot note. I skimmed through the EditPage logic, and it seems like EditPage is not seeing the parsed content.

SpamBlacklist always parses the content too, it doesn't only PST it.

Ah, I'm sorry, I've used both therms interchangeably.

I don't know what AbuseFilter does.

*_links variables will need a full parse, but *_pst will only require a PST, so it depends. I'd also like to make AF PST the new text before even trying to filter it, so to avoid filtering if the new content is identical to the old one, and the edit wouldn't be saved anyway (patch).

The actual "derived page data" part of DerivedPageDataUpdater does need to, but that happens (or can happen) post-save, in a DeferredUpdate and/or a job like LinksUpdate.

Good to know, thanks. So, SB does have a perceivable effect for the user.