Page MenuHomePhabricator

DSR information from stripped tag is lost when it is a direct child of <body>
Closed, DeclinedPublic

Description

In the following wikitext, a Parsoid round-trip results in the closing blockquote tag being lost:

: <blockquote>
foo
: </blockquote>

I haven't seen anything like this happen prior to the resolution of bug #64901, so I wonder if it may be a regression.


Version: unspecified
Severity: normal

Details

Reference
bz71465

Event Timeline

bzimport raised the priority of this task from to Low.Nov 22 2014, 3:50 AM
bzimport set Reference to bz71465.

No, it is not a regression. It doesn't roundtrip in edit mode. See output (I've removed a serializer warning to remove clutter). But, selser will preserve it in most cases except probably where the line containing the end blocktag is edited.


[subbu@earth lib] echo ":<blockquote>\nfoo\n:</blockquote>" | node parse --wt2wt --rtTestMode true
:<blockquote>
foo
:</blockquote>
[subbu@earth lib] echo ":<blockquote>\nfoo\n:</blockquote>" | node parse --wt2wt
:<blockquote>
foo

:

In this case, the wikitext is badly nested and the opening and closing blockquote tags are considered nested in separate dl-dt lists and the treebuilder will close the opening tag automatically within the first list and strip the closing tab from the second list. Parsoid recovers this information about fixups from the DOM and adds fixup information. In normal wt2wt mode, that information is used since the DOM could have been edited since (to fix the errors, for ex.).

So, the behavior is as expected.

(In reply to ssastry from comment #1)

(I've removed a serializer warning to remove clutter). But, selser will
preserve it in most cases except probably where the line containing the end
blocktag is edited.

In that case, reopening as an issue with selser, because it's not handling this right then. Here's the edit where this was actually a problem:
https://en.wikipedia.org/w/index.php?title=Talk:Neil_deGrasse_Tyson&diff=627662324&oldid=627660338

The blockquote that was removed was 1000 lines away from the actual changes.

$ echo ":<blockquote>\na\n:</blockquote>\nb" | node parse --rtTestMode true
...
<dl data-parsoid='{"dsr":[0,13,0,0]}'><dd data-parsoid='{"dsr":[0,13,1,0]}'><blockquote data-parsoid='{"stx":"html","autoInsertedEnd":true,"dsr":[1,13,12,0]}'></blockquote></dd></dl>
<p data-parsoid='{"dsr":[14,15,0,0]}'>a</p>
<dl data-parsoid='{"dsr":[16,17,0,0]}'><dd data-parsoid='{"dsr":[16,17,1,0]}'></dd></dl><meta typeof="mw:Placeholder/StrippedTag" data-parsoid='{"src":"&lt;/blockquote>","name":"BLOCKQUOTE","dsr":[17,30,null,null]}'/>
<p data-parsoid='{"dsr":[31,32,0,0]}'>b</p>
...

So, the dsr info from the stripped tag is lost. Perhaps, we should wrap these stripped tags in a placeholder span so that it carries the dsr information in edit mode. In an editor, it will show up as an empty span (and not sure what the editing implications are for this solution). But for this editing issue that needs resolution, this should be a straightforward fix in the markTreeBuilderFixups pass.

I am marking this low priority since this is not a big issue as far as I can tell. Feel free to bump up the priority if there are other scenarios where this might be a problem.

This is acceptable normalization for broken wikitext (as in this example). Selser exists to prevent dirty diffs in the common scenarios. If this becomes a real problem for editors, I expect a similar bug to be opened in the future.