Page MenuHomePhabricator

Run performance tests of the new DOM library
Open, MediumPublic

Description

@jlinehan's early tests indicate upto a 2x performance boost on tree building tests which is quite a good early result.

However, DOM libraries will be used in a wide variety of use patterns. Parsoid itself has the following usage profiles:

  • Tree building (one-time per pipeline)
  • Tree walking (very common)
  • Tree mutation (common)

Besides this, Parsoid has, over the years, run into pathological scenarios that led to O(N^2) performance degradation on trees with

  • large tables
  • large lists
  • mutation of DOM nodes with a large number of children

So, to get a reliable sense of how this library's performance, it is important to run a suite of performance tests that exercise the various modes.

Event Timeline

ssastry triaged this task as Medium priority.Dec 7 2020, 1:13 AM

I benchmarked Dodo integrated with Parsoid while parsing a realistic test case ([[Australia]] with local templates and no images), PHP 8.0.12 stock (no DOM patch).

BaselineDodo% difference
Time (s)5.28.7+69%
PHP memory (MB)124202+63%
Peak RSS (MB)182247+35%
GC time (s)0.22.3+940%

GC time is the percentage time spent in zend_gc_collect_cycles according to perf record, multiplied by the total time as measured without perf record.

The increase in time spent in the garbage collector accounts for about 66% of the time overhead of switching to Dodo. Measures like inlining method calls, removing assertions and optimising case folding would go some way to addressing the remaining 34%.

The measurements for Dodo included a couple of minor patches addressing low-hanging fruit: in Element::getNodeName() I inlined getPrefix() and getLocalName() and commented out the case folding.

The increase in memory usage can be explained by looking at the data structures. A Dodo Element has 13 properties, for a minimum memory usage of 13*sizeof(zval)+sizeof(zend_object) = 264 bytes, not including strings. Dodo Attribute nodes also have 13 properties and so are about the same size. Compare that to libxml2: sizeof(xmlNode) = 120, sizeof(xmlAttr) = 96. So we can expect a Dodo DOM to use approximately twice as much memory as a libxml2 DOM.

What is sizeof(zval) and sizeof(zend_object)? And for a proper comparison the libxml2 numbers should probably also include a sizeof(zend_object) or sizeof(zval) or two because they require a PHP wrapper object. Granted, those wrappers are gc'ed aggressively, so probably a wrapper isn't present for every node, just "live" nodes in the tree.

There are some straightforward optimizations to reduce the number of properties (and the number of strings). Assuming sizeof(zval)=16 and sizeof(zend_object)=56, it seems we'd need the typical Dodo node to have just 4 properties and similar for a typical attribute. That's not impossible: parent/document/next/prev for Nodes (erk, we'd have to add one for the Attribute list for Elements), and name/value/ownerElement for non-namespaced Attributes. A bunch of stuff which is used rarely can either be added as a dynamic property or as a weak map in the document. Hm. (Shrinking the sizeof(zval) would help a lot, too, that's a lot of wasted space.)