DU.(get|set)NodeData touch node object internals, which won't be available after a php port.
Maybe combine this with the tagId set during treebuilding (https://github.com/wikimedia/parsoid/blob/master/lib/wt2html/HTML5TreeBuilder.js#L61)
DU.(get|set)NodeData touch node object internals, which won't be available after a php port.
Maybe combine this with the tagId set during treebuilding (https://github.com/wikimedia/parsoid/blob/master/lib/wt2html/HTML5TreeBuilder.js#L61)
Project | Branch | Lines +/- | Subject | |
---|---|---|---|---|
mediawiki/services/parsoid | master | +225 -144 | Use a bag-on-the-side implementation for node data | |
mediawiki/services/parsoid | master | +736 -712 | [WIP] Move .dataobject off nodes |
17:22 <+subbu> arlolra, "DU.(get|set)NodeData touch node object internals, which won't be available after a php port." ... I annotated the notes later indicating that I may have been mistaken because Tim seems have to written code assuming it works (in 2014).
17:22 <+subbu> so, either the dom impl has changed, he didn't test it, or i didn't use the right api method.
17:22 <+subbu> that apart, it is still useful to do the bag on the side impl.
17:24 <+cscott> i think PHP supports "property added after the fact", sort of like JS does
17:24 <+cscott> but (a) it's known to be quite slow in PHP, probably not appropriate for something used as often as data-parsoid/etc annotations
17:24 <+cscott> and (b) it *may* not be supported on native objects, which the DOM node classes might be? worth a test, maybe...
17:25 <+cscott> ...but better yet if we don't need to
17:25 <+cscott> http://php.net/manual/en/book.dom.php
17:25 <+cscott> i don't immediately see anything specific to mapping DOM nodes to native PHP objects
17:30 <+cscott> https://stackoverflow.com/questions/8707235/how-to-create-new-property-dynamically has some benchmarks
https://stackoverflow.com/questions/8707235/how-to-create-new-property-dynamically has some benchmarks showing how slow "expando" properties are.
We were mistaken about php and dynamic properties (.dataobject) on instances of classes ... so, some of the rationale for this is not valid anymore. But, from a code cleanliness POV (and performance as @cscott refers to in his comment), this may still be a good thing to do.
So it turns out that PHP's DOM classes are a bit weird. As long as a reference to the DOMNode exists in PHP, things work fine. But it seems the DOMNode object gets destroyed when the last PHP reference goes away, and a new one is then created when the same node is later accessed.
$doc = new DOMDocument; $node = $doc->createElement( 'b' ); $node->appendChild( $doc->createElement( 'a' ) ); $a = $node->firstChild; $a->dataobject = []; var_dump( isset( $a->dataobject ) ); // => true var_dump( isset( $node->firstChild->dataobject ) ); // => true $a = null; var_dump( isset( $node->firstChild->dataobject ) ); // => false !! $node->firstChild->dataobject = []; var_dump( isset( $node->firstChild->dataobject ) ); // => false !! $a = $node->firstChild; var_dump( isset( $node->firstChild->dataobject ) ); // => false $node->firstChild->dataobject = []; var_dump( isset( $node->firstChild->dataobject ) ); // => true var_dump( isset( $a->dataobject ) ); // => true
So, it looks like we need to implement this on the JS side first and not rely on persistence of the PHP wrapper objects around libxml2.
If you're referring to https://stackoverflow.com/a/8707348/634419, that's something rather different. We're not looking at implementing __get() or __set() here. Try this:
<?php define( 'REPS', 100000000 ); class A { } $t0 = microtime( true ); $a = new A(); for ( $i = 0; $i < REPS; $i++ ) { $a->data = 'test' . $i; } $t1 = microtime( true ); echo "Added property: " . ( $t1 - $t0 ) . "\n"; class B { public $data; } $t0 = microtime( true ); $b = new B(); for ( $i = 0; $i < REPS; $i++ ) { $b->data = 'test' . $i; } $t1 = microtime( true ); echo "Declared property: " . ( $t1 - $t0 ) . "\n";
On my laptop, it looks like the added property is a whole 3 or 4 nanoseconds slower.
Change 489332 had a related patch set uploaded (by Arlolra; owner: Arlolra):
[mediawiki/services/parsoid@master] [WIP] Move .dataobject off nodes
[15] boris> $doc = DOMDocument::loadHTML('<b>hi'); → object(DOMDocument)( ) [16] boris> $doc->env = [ 'doc' => $doc ]; → array( 'doc' => object(DOMDocument)( 'env' => *** RECURSION *** ) ) [17] boris> $doc->lastChild->ownerDocument->env; → array( 'doc' => object(DOMDocument)( 'env' => *** RECURSION *** ) ) [18] boris> $n = $doc->lastChild; → object(DOMElement)( ) [19] boris> $doc = null; → NULL [20] boris> $n->ownerDocument->env; → array( 'doc' => object(DOMDocument)( 'env' => *** RECURSION *** )
Huh, the circular reference really does keep the PHP DOMDocument alive as long as a node is alive?
Change 490654 had a related patch set uploaded (by Arlolra; owner: Arlolra):
[mediawiki/services/parsoid@master] [WIP] Use a bag-on-the-side implementation for node data
Change 490654 merged by jenkins-bot:
[mediawiki/services/parsoid@master] Use a bag-on-the-side implementation for node data
Mentioned in SAL (#wikimedia-operations) [2019-02-26T18:24:33Z] <arlolra> Updated Parsoid to e82347d (T204608, T214099, T217093)