Currently Parsoid allows outputHasCoreMwDomSpecMarkup to be either true (the contents should be treated as parsoid markup, ie normalized as wikitext, fed to selser, etc) or false (the contents should be treated as opaque extension markup and not normalized).
We should also allow a mixed value to allow for cases like <gallery>, where the surrounding frame is extension markup but it contains embedded DOM Spec markup for the image captions.
The outputHasCoreMwDomSpecMarkup feature was added in 900fe1c0c33fc74ac9e8c24975b9fb0463b8dc77 and that patch also illustrates where the new support should be added to DOMNormalizer/etc -- instead of just "skip" or "recurse into" as we handle a node, the new mode would look for marked subtrees and recurse into those subtrees while skipping the rest of the extension content.
WLOG I'll propose that the subtrees are marked with typeof="mw:UserContent". An example DOM tree is shown in T318433#8680937 for a notional {{#figure}} parser function, which allows user content both for the figure contents and the figure body:
<figure typeof="mw:Transclusion mw:File" ...> <a href="..."><span typeof="mw:UserContent">...parsed user wikitext...</span></a> <figcaption typeof="mw:UserContent">...parsed user wikitext...</figcaption> </figure>
An alternative to marking the content with a typeof would be delegating to a handler method of the extension, which would return a list of DOM fragments that normalization/etc should recurse into. This is more consistent with the way outputHasCoreMwDomSpecMarkup is handled (ie, the value is looked up based on the transclusion name) but makes it harder to process a Document without full information about the extensions registered in the MediaWiki instance which generated it. (One may argue that there are plenty of dependencies on SiteConfig anyway, and this is just one more.)
EDIT: copying from T214994#9323382:
My initial thought was that we'd label these inside the HTML with a <div> or <span> wrapper, as floated in comments on https://gerrit.wikimedia.org/r/c/mediawiki/services/parsoid/+/831077/ (Bikeshed alert):
typeof="mw:Raw" means Parsoid shouldn't touch it, except for subtrees marked mw:DOM; and typeof="mw:DOM" is applied to the root element in Parsoid's output and means parsoid /can/ touch it, apart from subtrees marked mw:Raw. That collapses the entire tunneling issue to "your tunneled HTML can't contain typeof="mw:DOM" but anything else is protected from parsoid".
Then extensions don't have to register an option, they just need to ensure that their top-level element has an appropriate typeof; and if they were just outputting something from parsoid it will have the typeof="mw:DOM" on the top-level element by default so they don't need to do anything special.
The if an MCR pass wants to protect its squashed ParserOutput content from Parsoid interference, all it has to do is slap a <div typeof="mw:Raw"> around it.
That doesn't work great with the regexp-based postprocessors we currently have inherited from our legacy codebase; regexps aren't great at matching up <div..> and </div> appropriately. Perhaps markup inspired by @ihurbain's annotation markup is better, something like:
<meta typeof="start" data-id="random uuid"> .... <meta typeof="end" data-id="random uuid">
so long as the UUIDs are distinct and matching, the regexp will always be able to match and skip the appropriate section.