Page MenuHomePhabricator

Selser restores removed separators if adjacent nodes are both old
Closed, ResolvedPublic

Description

From https://gerrit.wikimedia.org/r/#/c/215646/,

The selser failure for "HTML headers vs TOC" is kind of interesting.
In the src wikitext, the behavior switch and heading have a nl sep.
However, it's edited away by the random change. The wt2wt sees two
old elts on the same line and is happy to serialize them as is.
Selser, for its part, looks for the original separator and
reinserts the nl.

Try with,

__TOC__
=== hi ===

Parse it, remove the newline, then pass it to node parse --selser --oldtextfile ... --oldhtmlfile ...

Event Timeline

Arlolra raised the priority of this task from to Low.
Arlolra updated the task description. (Show Details)
Arlolra subscribed.
ssastry claimed this task.
ssastry subscribed.
[subbu@earth:~/work/wmf/parsoid] php bin/parserTests.php --selser --no-blacklist --filter 'HTML headers vs TOC' --changetree '[2,0,0,2,[2],4,3,0,4,0,[3],3,0]' tests/parserTests.txt
Loaded blacklist from /home/subbu/work/wmf/parsoid/tests/parserTests-php-blacklist.json. Found 1650 entries!
EXPECTED PASS: HTML headers vs TOC (T25393) (__NOEDITSECTION__ for clearer output, doesn't matter here) [2,0,0,2,[2],4,3,0,4,0,[3],3,0] (selser)