Page MenuHomePhabricator

Get rid of useWhitespaceHeuristics
Closed, ResolvedPublic

Description

The code paths in DOMHandler::getTrailingSpace() and DOMHandler::getLeadingSpace() taken when useWhitespaceHeuristics is true seem like they're better placed in Separators::buildSep() where the other logic to restore original separators already exists. This duplication has led to a few bugs (T260960 / T245206).

From the discussion in the review,
https://gerrit.wikimedia.org/r/c/mediawiki/services/parsoid/+/628955/3/src/Html2Wt/DOMHandlers/DOMHandler.php

  • buildSep was assumed to restore separators only between adjacent elements, however, it has an asymmetric path of walking back from non-elements nodes to find an element to calculate dsr off of. That should maybe be extended to both directions.
  • Some helpers can be created to help clarify predicates, like,
    • DOMUtils::haveDSR($node) for $node instanceof DOMElement and !empty( .. dsr ... )
    • !isAnOldEltNode( $lc )) for !$lc || !DOMUtils::isElt( $lc ) || WTUtils::isNewElt( $lc )

Event Timeline

Arlolra created this task.Sep 21 2020, 10:14 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptSep 21 2020, 10:15 PM
ssastry triaged this task as Medium priority.Sep 21 2020, 10:46 PM
ssastry moved this task from Needs Triage to Tech Debt / Big changes on the Parsoid board.

Change 628955 had a related patch set uploaded (by Arlolra; owner: Arlolra):
[mediawiki/services/parsoid@master] Avoid double dip whitespace reuse when getting trailing space

https://gerrit.wikimedia.org/r/628955

Arlolra updated the task description. (Show Details)Sep 23 2020, 9:32 PM

Change 628955 merged by jenkins-bot:
[mediawiki/services/parsoid@master] Avoid double dip whitespace reuse when getting trailing space

https://gerrit.wikimedia.org/r/628955

Change 630676 had a related patch set uploaded (by Subramanya Sastry; owner: Subramanya Sastry):
[mediawiki/vendor@master] Bump wikimedia\parsoid to 0.13.0-a11

https://gerrit.wikimedia.org/r/630676

Change 630676 merged by jenkins-bot:
[mediawiki/vendor@master] Bump wikimedia/parsoid to 0.13.0-a11

https://gerrit.wikimedia.org/r/630676

Change 630711 had a related patch set uploaded (by Subramanya Sastry; owner: Subramanya Sastry):
[mediawiki/services/parsoid@master] WIP: move whitespace heuristics to buildSep

https://gerrit.wikimedia.org/r/630711

ssastry claimed this task.Sep 29 2020, 2:22 AM

Since I am in this code right now to tackle list items & selser issues for discussion tools, might as well clean this up.

Change 630711 merged by jenkins-bot:
[mediawiki/services/parsoid@master] Move whitespace heuristics for trimmable whitespace nodes to buildSep

https://gerrit.wikimedia.org/r/630711

Change 635100 had a related patch set uploaded (by C. Scott Ananian; owner: C. Scott Ananian):
[mediawiki/vendor@master] Bump wikimedia/parsoid to 0.13.0-a12

https://gerrit.wikimedia.org/r/635100

Change 635100 merged by jenkins-bot:
[mediawiki/vendor@master] Bump wikimedia/parsoid to 0.13.0-a12

https://gerrit.wikimedia.org/r/635100

ssastry removed ssastry as the assignee of this task.Nov 16 2020, 5:58 PM
ssastry added a subscriber: ssastry.
Arlolra closed this task as Resolved.Nov 16 2020, 6:10 PM
Arlolra assigned this task to ssastry.