PHP DOM is based on DOM Level 1 Core and DOM Level 2 XML, and doesn't support some DOM HTML functionality that Parsoid relies on. Specifically, here is a list of gaps (non-exhaustive):
- querySelector(..) and querySelectorAll(...)
- body property on the DOMDocument
- innerHTML, outerHTML setters / getters on DOMElement
- classList on DOMElement
We need to provide home-grown utilities for these OR extend PHP's DOM implementation to support this.
Notes from our meeting on Jan 27, 2019
- PHP DOM implementation: security fixes, spec compliance (2015), and other fixes ... so, being maintained
- Five possible strategies to filling gaps:
- Update the DOM extension in PHP core to follow modern DOM standards
- https://github.com/php/php-src/pull/3750 - seems like some folks are trying to do this, at least
- Wrap a more modern C library (since libxml2 seems to be frozen in time)
- In Sep 2018 PHP bumped libxml to 2.7.6, but it looks like libxml2 is in bugfix-only mode
- Write a pure-PHP implementation of the DOM (like domino and/or Remex) -- not happening but listed here for purposes of completeness
- Subclass the built-in DOM libraries's objects, overwrite with new functionality as needed -- most likely
- This could be a composer library which others could help maintain
- If/when functionality is added to core library, the subclass methods could be removed
- Open Q: document.createElement() etc need to be overridden to create subclass
- This is most pragmatic option, we'll probably do this (at least initially)
- Better yet: find *someone else* who is doing one of the above, and use their work!
- Update the DOM extension in PHP core to follow modern DOM standards