I propose adding two hooks into MediaWiki's parser: PPFrameBeforeExpansion and PPFrameAfterExpansion, both in the PPFrame::expand() method. These hooks would let extension developers deeply hook into the parser during DOM tree expansion.
Hooks description
PPFrameBeforeExpansion is called after expand() checks for expansion limits and increases the expansion depth by 1, but before any actual expansion. This hook passes three parameters:
- &$frame – the frame currently being expanded. This is needed for extensions to get a reference to the Parser and the Title object this frame is related to.
- &$root – the root node of this expand() call. One PPFrame can have its expand() method called many times on different PPNodes (or arrays or strings), so extensions must have a way to distinguish between these calls.
- $expansionDepth – current expansion depth.
PPFrameAfterExpansion is called after all expansion has been done, just before reducing the expansion depth by 1 and returning expanded text. This hook passes four parameters:
- &$frame – the frame currently being expanded. This is needed for extensions to get a reference to the Parser and the Title object this frame is related to.
- &$root – the root node of this expand() call. One PPFrame can have its expand() method called many times on different PPNodes (or arrays or strings), so extensions must have a way to distinguish between these calls.
- $expansionDepth – current expansion depth.
- &$text – the text returned after expanding this node. This gives extensions a chance to modify the output text.
Motivation
I am developing the AdvancedBacklinks extension that attempts to add more backlink tracking functionality to MediaWiki. Its most important feature is preserving information about which template added what links to a page. With how MediaWiki's parser is made, though, that is not trivial and requires hooking deeply into the parser in order to know at which stage of expansion a certain link was added.
I am developing this with Nonsensopedia's community, but this has been a long-requested feature on Wikimedia since 2005 (see tasks T3392, T5241, T14396, this was also #41 on Community Wishlist Survey 2016). I am not developing this stuff for Wikimedia really, but this code could be used for that later. Maybe.
You can see an experimental branch of the AdvancedBacklinks extension using these hooks here.
The current solution is avoiding the parser entirely, by using Parsoid instead. This… has many limitations. Parsoid does mark what code was added by which template, but only does so for the first level of expansion, which is OK for most applications, but fails horribly on pages that are using templates to structure all content on them. Parsoid is also full of bugs and differs in behaviour from the original parser a lot, which makes it a tough sell for long-established communities that don't want their backlinks to just disappear into thin air. Also: it's an external dependency that consumes lots of resources on small servers and slows down the LinksUpdate process significantly, as each page has to be parsed twice by two different parsers.
Can this be done with other hooks?
Not really. The parser first expands everything and then it does the actual link parsing, after all transclusion information has been lost. The only solutions are to either add hooks deep in the expansion code that would let the extension modify the text during expansion, or to make a copy of parser's code with slight modifications and parse the text twice. The second option is clearly insane and not future-proof.