Page MenuHomePhabricator

Add hooks to the PPFrame::expand() method
Open, Needs TriagePublic

Description

I propose adding two hooks into MediaWiki's parser: PPFrameBeforeExpansion and PPFrameAfterExpansion, both in the PPFrame::expand() method. These hooks would let extension developers deeply hook into the parser during DOM tree expansion.

Hooks description
PPFrameBeforeExpansion is called after expand() checks for expansion limits and increases the expansion depth by 1, but before any actual expansion. This hook passes three parameters:

  • &$frame – the frame currently being expanded. This is needed for extensions to get a reference to the Parser and the Title object this frame is related to.
  • &$root – the root node of this expand() call. One PPFrame can have its expand() method called many times on different PPNodes (or arrays or strings), so extensions must have a way to distinguish between these calls.
  • $expansionDepth – current expansion depth.

PPFrameAfterExpansion is called after all expansion has been done, just before reducing the expansion depth by 1 and returning expanded text. This hook passes four parameters:

  • &$frame – the frame currently being expanded. This is needed for extensions to get a reference to the Parser and the Title object this frame is related to.
  • &$root – the root node of this expand() call. One PPFrame can have its expand() method called many times on different PPNodes (or arrays or strings), so extensions must have a way to distinguish between these calls.
  • $expansionDepth – current expansion depth.
  • &$text – the text returned after expanding this node. This gives extensions a chance to modify the output text.

Motivation
I am developing the AdvancedBacklinks extension that attempts to add more backlink tracking functionality to MediaWiki. Its most important feature is preserving information about which template added what links to a page. With how MediaWiki's parser is made, though, that is not trivial and requires hooking deeply into the parser in order to know at which stage of expansion a certain link was added.

I am developing this with Nonsensopedia's community, but this has been a long-requested feature on Wikimedia since 2005 (see tasks T3392, T5241, T14396, this was also #41 on Community Wishlist Survey 2016). I am not developing this stuff for Wikimedia really, but this code could be used for that later. Maybe.

You can see an experimental branch of the AdvancedBacklinks extension using these hooks here.

The current solution is avoiding the parser entirely, by using Parsoid instead. This… has many limitations. Parsoid does mark what code was added by which template, but only does so for the first level of expansion, which is OK for most applications, but fails horribly on pages that are using templates to structure all content on them. Parsoid is also full of bugs and differs in behaviour from the original parser a lot, which makes it a tough sell for long-established communities that don't want their backlinks to just disappear into thin air. Also: it's an external dependency that consumes lots of resources on small servers and slows down the LinksUpdate process significantly, as each page has to be parsed twice by two different parsers.

Can this be done with other hooks?
Not really. The parser first expands everything and then it does the actual link parsing, after all transclusion information has been lost. The only solutions are to either add hooks deep in the expansion code that would let the extension modify the text during expansion, or to make a copy of parser's code with slight modifications and parse the text twice. The second option is clearly insane and not future-proof.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptSep 9 2019, 8:40 AM

Change 535123 had a related patch set uploaded (by Ostrzyciel; owner: Ostrzyciel):
[mediawiki/core@master] Add two hooks to PPFrame::expand() See the Phabricator task for more details.

https://gerrit.wikimedia.org/r/535123

Ostrzyciel updated the task description. (Show Details)Sep 13 2019, 9:05 AM

I removed the $expansionDepth parameter entirely from both hooks. During testing it turned out to be returning mostly random numbers, depending on a lot of factors, that made it unusable in production.
The extension does not rely on this parameter anymore.

cscott added a subscriber: cscott.Dec 18 2019, 5:18 PM

I suspect a more forward-compatible hook would be to hook link resolution, and pass in a reference to the frame there. The legacy parser is being replaced by Parsoid, so hooks based on the legacy parser aren't going to be maintainable long-term. However, Semantic MediaWiki wants to customize how [[....]] syntax is resolved, so we are likely to maintain some sort of hook based around that syntax.

I suspect a more forward-compatible hook would be to hook link resolution, and pass in a reference to the frame there. The legacy parser is being replaced by Parsoid, so hooks based on the legacy parser aren't going to be maintainable long-term. However, Semantic MediaWiki wants to customize how [[....]] syntax is resolved, so we are likely to maintain some sort of hook based around that syntax.

Well, yes, that does sound reasonable, but I would certainly need to experiment with that hook around to determine what kind of arguments it needs. Is there any SMW-related task for this? Or maybe could you point me to people interested in this? I would love to hear their ideas on this.