Page MenuHomePhabricator

Remex doesn't set ID attributes
Open, Needs TriagePublic

Description

Remex doesn't use the (PHP-specific) DOMElement#setIdAttribute method; as a result getElementById is much slower than if using PHP's built-in HTML parser (which does appropriately set the id attribute):

$ psysh
Psy Shell v0.9.9 (PHP 7.3.2-3 — cli) by Justin Hileman
>>> require 'vendor/autoload.php';
=> Composer\Autoload\ClassLoader {#2}
>>> ($html = file_get_contents( 'obama.html' )) || true;
=> true
>>> ($doc = new DOMDocument) || true;
=> true
>>> $doc->loadHTML($html);
PHP Warning:  DOMDocument::loadHTML(): Tag figure-inline invalid in Entity, line: 9 in /home/cananian/Projects/Wikimedia/zest.phpeval()'d code on line 1
>>> timeit -n100 $doc->getElementById('cite_ref-290');       
Command took 0.000003 seconds on average (0.000002 median; 0.000290 total) to complete.
>>> require('./tests/ZestTest.php')
=> 1
>>> $doc2 = \Wikimedia\Zest\Tests\ZestTest::loadHtml("./obama.html"); /* uses remex */
>>> $doc2->getElementById('cite_ref-290');
=> null
>>> timeit -n100 count(\Wikimedia\Zest\Zest::find('#cite_ref-290', $doc2));
=> 1
Command took 0.019840 seconds on average (0.019201 median; 1.984014 total) to complete.
>>> timeit -n100 ( new DOMXPath( $doc2 ) )->query( './/*[@id="cite_ref-290"]', $doc2->documentElement);
=> DOMNodeList {#2327
     +length: 1,
   }
Command took 0.011756 seconds on average (0.011539 median; 1.175620 total) to complete.

Related bug: T215000: Fill gaps in PHP DOM's functionality