Page MenuHomePhabricator

Remex doesn't uppercase tag/node names
Open, Needs TriagePublic


According to the DOM standard (eg and the Node#tagName (and thus also Node#nodeValue should be uppercase:

The tagName attribute’s getter must return the context object’s HTML-uppercased qualified name.

Remex -- and the standard PHP DOMDocument#loadHTML method -- use lowercase tag- and node names:

$ psysh
Psy Shell v0.9.9 (PHP 7.3.2-3 — cli) by Justin Hileman
>>> require 'vendor/autoload.php';
=> Composer\Autoload\ClassLoader {#2}
>>> ($html = file_get_contents( 'obama.html' )) || true;
=> true
>>> ($doc = new DOMDocument) || true;
=> true
>>> $doc->loadHTML($html);
>>> require('./tests/ZestTest.php')
=> 1
>>> $doc2 = \Wikimedia\Zest\Tests\ZestTest::loadHtml("./obama.html"); /* uses remex */
>>> $doc->documentElement->firstChild->tagName;
=> "head"
>>> $doc2->documentElement->firstChild->tagName;
=> "head"
>>> $doc->documentElement->firstChild->nodeName;
=> "head"
>>> $doc2->documentElement->firstChild->nodeName;
=> "head"

The PHP DOM implementation respects case-sensitivity (which it actually shouldn't):

>>> $doc2->createElement('p')->nodeName;
=> "p"
>>> $doc->createElement('p')->nodeName;
=> "p"
>>> $doc2->createElement('P')->nodeName;
=> "P"
>>> $doc->createElement('P')->nodeName;
=> "P"

Compare to JS in the browser:

> document.createElement('p').tagName

Remex should probably:

  1. provide an option to uppercase HTML tag names prior to passing them to createElement(), and/or
  2. allow passing in a different DOMImplementation to RemexHtml\DOM\DOMBuilder to provide proper behavior for html/

Option 1 would, in the short term, allow Parsoid to continue to use uppercase when comparing tagName strings; it would have to take care to always use uppercase when calling createElement though. This would be a bridge to option 2, once we have a proper spec-compliant DOM implementation (T215000).