Page MenuHomePhabricator

Stop using autoInserted* flags incorrectly during html2wt when not in round-trip-testing mode
Open, MediumPublic

Description

See session below that illustrates the problem.

[subbu@earth:~/work/wmf/parsoid] echo '<div>foo' > /tmp/wt
[subbu@earth:~/work/wmf/parsoid] php bin/parse.php < /tmp/wt > /tmp/old.html
[subbu@earth:~/work/wmf/parsoid] sed 's/<\/div>/<\/div><p>boo<\/p>/g' < /tmp/old.html > /tmp/new.html
[subbu@earth:~/work/wmf/parsoid] php bin/parse.php --html2wt < /tmp/new.html
<div>fooboo

[subbu@earth:~/work/wmf/parsoid] php bin/parse.php --selser --oldtextfile /tmp/wt --oldhtmlfile /tmp/old.html < /tmp/new.html
<div>fooboo

[subbu@earth:~/work/wmf/parsoid] sed 's/<\/div>/<\/div><p>boo<\/p>/g;s/"autoInsertedEnd":true/"autoInsertedEnd":false/' < /tmp/old.html > /tmp/new.html
[subbu@earth:~/work/wmf/parsoid] php bin/parse.php --html2wt < /tmp/new.html
<div>foo</div>boo

[subbu@earth:~/work/wmf/parsoid] php bin/parse.php --selser --oldtextfile /tmp/wt --oldhtmlfile /tmp/old.html < /tmp/new.html
<div>foo</div>boo

Original report my @matmarex
Parsoid seems to never insert the closing tag for an unclosed element. This is usually good since it avoids dirty diffs, but in some cases adding it is necessary to preserve the layout.

For example, consider this VE edit, where there is a <div ...> element on the page that is missing the closing tag. I typed a new paragraph after the end of that element in VE, but after saving the edit, it appeared inside of it instead (and a paragraph break is missing) – a </div> tag should have been added:

https://en.wikipedia.org/w/index.php?title=User:Matma_Rex/sandbox&diff=956168899&oldid=956168836&diffmode=source

image.png (2×3 px, 427 KB)
image.png (2×3 px, 422 KB)

The same problem occurs with an unclosed table {| ... where the closing |} is not inserted.

Event Timeline

This comment was removed by ssastry.
ssastry renamed this task from Parsoid doesn't insert the closing tag for an unclosed element when it's necessary to preserve the layout to Stop using autoInserted* flags incorrectly during html2wt when not in round-trip-testing mode.EditedMay 12 2020, 9:03 PM
ssastry triaged this task as Medium priority.
ssastry updated the task description. (Show Details)
ssastry added subscribers: Catrope, Jdforrester-WMF, Elitre.

I decided to merge in the older ticket into this one since that already had partial fixes and no reason to wade through all that discussion.

Deleted my previous comment since I updated the description with that.