Currently the way to parse a HTML fragment with Remex is along the lines of
$domBuilder = new DOMBuilder(); $treeBuilder = new TreeBuilder( $domBuilder ); $dispatcher = new Dispatcher( $treeBuilder ); $tokenizer = new Tokenizer( $dispatcher, $html, [] ); $tokenizer->execute( [ 'fragmentNamespace' => HTMLData::NS_HTML, 'fragmentName' => 'div', ] ); $wrapper = $domBuilder->getFragment(); foreach ( $wrapper->childNodes as $node ) { // do something with the resulting DOM forest }
When used for innerHTML-style funcionality, that means Remex will create a document, build the DOM tree within it, then we have to import the nodes into the document where the inner HTML replacement is being done. ID indexes get lost during importing (although right now Remex doesn't support them anyway; that's T217696). It would be simpler and less error-prone if Remex could work within a given document (either with a detached fragment wrapper node, or using a specified node in the document for that).