Page MenuHomePhabricator

Replace libxml/xpath in HtmlFormatter with Remex/zest
Open, Needs TriagePublic

Description

The HtmlFormatter project is used a few (not that many) places:
https://codesearch.wmflabs.org/deployed/?q=use%20HtmlFormatter%5C%5C&i=nope

It is built on libxml and xpath with a bunch of hacks to avoid bugs, and a partial CSS-selector-to-xpath translator. We should rebase this on Remex (to parse HTML) and zest.php (to match selectors). This will allow us to reduce our dependence on libxml, increase code coverage and usage of Remex, improve corner case parsing of HTML and selectors, and generally put our eggs in fewer baskets.

(It's possible we shouldn't use zest, but should instead just use a slightly better version of CSS-selector-to-xpath, which can be shared with Parsoid.)

Event Timeline

cscott created this task.Feb 28 2019, 6:13 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptFeb 28 2019, 6:13 PM
Restricted Application added a project: Discovery-Search. · View Herald TranscriptFeb 28 2019, 6:15 PM
cscott updated the task description. (Show Details)Mar 5 2019, 4:43 PM
cscott updated the task description. (Show Details)
cscott added subscribers: Parsing-Team, Tgr, Anomie.
Tgr added a comment.Mar 5 2019, 7:34 PM

IMO Remex has plenty of documentation debt to pay down before it can be added to an extension without frustrating its maintainers.
Zest on the other hand seems pleasant enough to use, and should work with libxml-based parsing just as well.