Page MenuHomePhabricator

Parsoid runs ParserAfterTidy & ParserAfterParse hooks multiple times, causing problems for DiscussionTools
Open, Needs TriagePublic

Description

Parsoid doesn't "really" support the ParserAfterTidy and ParserAfterParse hooks, because they would mutate the "edit mode" HTML provided to VisualEditor in unpredictable ways.

However, they are often required by extensions -- for example, the Math extension basically does no work in its extension tag handler, instead just queuing up a batch which it then processes all at once during the ParserAfterTidy hook.

In order to provide compatibility, Parsoid has a special entry point in`Parser::parseExtensionTagAsTopLevelDoc` which "pretends" that each extension tag is its own self-contained document and calls ::recursiveTagParse() (invoking hooks ParserBeforeInternalParse, InternalParseBeforeLinks T352448) then explicitly calling the ParserAfterParse hook (as if the entire page was done) and finally calling ::internalParseHalfParsed()with $isMain==true so that the ParserAfterTidy hook is invoked.

But this causes issues for DiscussionTools, which contains a ParserAfterTidy hook which it expects will *not* be called by Parsoid -- but if there are extensions on the page the ParserAfterTidy hook will be invoked once per extension, and the resulting ParserOutputs will be merged into the main ParserOutput.

This model provides good compatibility for many extensions, but causes issues for others, especially if there are multiple extension tags on the same page. Recommended practice is to use mergable ParserOutputs for each extension, and then do any final cleanup in a post-processing hook like OutputPageParserOutput or ParserOutputPostCacheTransform.

This task is for the long-term plan to fix this more properly:

  • First, don't use getText() in DataAccess::parseWikitext so we don't have a similar issue with ParserOutputPostCacheTransform being called multiple times on subdocuments. (https://gerrit.wikimedia.org/r/c/mediawiki/core/+/977313)
  • We should probably invoke the ParserBeforeInternalParse hook from Parsoid, for compatibility.
  • Deprecate and remove InternalParseBeforeLinks (T352448)
  • For Parsoid use, invoke the ParserAfter* hooks in ParsoidParser at the end of parsing, but in a way such that their effect is contained to extension HTML. This might involve extracting the extension HTML from the full document, calling ParserAfter* with only that extension HTML, and then reinserting the extension HTML into the full document.