Page MenuHomePhabricator

Replace libxml/xpath in HtmlFormatter with Remex/zest
Closed, DeclinedPublic

Description

The HtmlFormatter project is used a few (not that many) places:
https://codesearch.wmflabs.org/deployed/?q=use%20HtmlFormatter%5C%5C&i=nope

It is built on libxml and xpath with a bunch of hacks to avoid bugs, and a partial CSS-selector-to-xpath translator. We should rebase this on Remex (to parse HTML) and zest.php (to match selectors). This will allow us to reduce our dependence on libxml, increase code coverage and usage of Remex, improve corner case parsing of HTML and selectors, and generally put our eggs in fewer baskets.

(It's possible we shouldn't use zest, but should instead just use a slightly better version of CSS-selector-to-xpath, which can be shared with Parsoid.)

Event Timeline

IMO Remex has plenty of documentation debt to pay down before it can be added to an extension without frustrating its maintainers.
Zest on the other hand seems pleasant enough to use, and should work with libxml-based parsing just as well.

matmarex subscribed.

If I understand correctly, this task is about replacing the native PHP APIs with Remex inside the HtmlFormatter library. I think that's not a good use of our time, and we should instead focus on replacing the whole HtmlFormatter library and use Remex directly – see T255586: Replace HTMLFormatter by Remex.