Page MenuHomePhabricator

Line based p-wrapping can't match Remex
Closed, DuplicatePublic

Description

The php parser does p-wrapping in two ways: BlockLevelPass does line based wrapping and then Remex does it on SAX events to p-wrap unwrapped text that the php parser skipped because of the idiosyncrasies of the block level pass (See T134469: doBlockLevels() inserts <p> and </p> randomly with no regard for HTML validity)

Parsoid's p-wrapper matches the line based wrapping pretty faithfully, but also tries to do Remex's top level pass on the line using firstBlockTokenType as a heuristic. The latter should be moved to a DOM pass where there's better insight for when we're in a block.

See the FIXME added in https://gerrit.wikimedia.org/r/#/c/mediawiki/services/parsoid/+/436847/ for T194806