According to the DOM standard (eg https://dom.spec.whatwg.org/#dom-element-tagname and https://developer.mozilla.org/en-US/docs/Web/API/Element/tagName) the Node#tagName (and thus also Node#nodeValue should be uppercase:
The tagName attribute’s getter must return the context object’s HTML-uppercased qualified name.
Remex -- and the standard PHP DOMDocument#loadHTML method -- use lowercase tag- and node names:
$ psysh Psy Shell v0.9.9 (PHP 7.3.2-3 — cli) by Justin Hileman >>> require 'vendor/autoload.php'; => Composer\Autoload\ClassLoader {#2} >>> ($html = file_get_contents( 'obama.html' )) || true; => true >>> ($doc = new DOMDocument) || true; => true >>> $doc->loadHTML($html); >>> require('./tests/ZestTest.php') => 1 >>> $doc2 = \Wikimedia\Zest\Tests\ZestTest::loadHtml("./obama.html"); /* uses remex */ >>> $doc->documentElement->firstChild->tagName; => "head" >>> $doc2->documentElement->firstChild->tagName; => "head" >>> $doc->documentElement->firstChild->nodeName; => "head" >>> $doc2->documentElement->firstChild->nodeName; => "head"
The PHP DOM implementation respects case-sensitivity (which it actually shouldn't):
>>> $doc2->createElement('p')->nodeName; => "p" >>> $doc->createElement('p')->nodeName; => "p" >>> $doc2->createElement('P')->nodeName; => "P" >>> $doc->createElement('P')->nodeName; => "P"
Compare to JS in the browser:
> document.createElement('p').tagName "P"
Remex should probably:
- provide an option to uppercase HTML tag names prior to passing them to createElement(), and/or
- allow passing in a different DOMImplementation to RemexHtml\DOM\DOMBuilder to provide proper behavior for html/
Option 1 would, in the short term, allow Parsoid to continue to use uppercase when comparing tagName strings; it would have to take care to always use uppercase when calling createElement though. This would be a bridge to option 2, once we have a proper spec-compliant DOM implementation (T215000).