Page MenuHomePhabricator

Block content in links.
Closed, DuplicatePublic

Description

http://parsoid.wmflabs.org:8001/latestresult/zh/EOS contains [[佳能 EOS 300D|300D<p>Digital Rebel<p>Kiss Digital]]

block content inside links is problematic:

> div = document.createElement('div')
<div>​</div>​
> div.innerHTML = '<a href="foo">foo<p>bar</a>'
"<a href="foo">foo<p>bar</a>"
> div.outerHTML
"<div><a href="foo">foo</a><p><a href="foo">bar</a></p></div>"

Details

Reference
bz47963

Event Timeline

bzimport raised the priority of this task from to Low.Nov 22 2014, 1:29 AM
bzimport added a project: Parsoid-DOM.
bzimport set Reference to bz47963.

This is probably related to bug 47326... but bug 47326 is fixable. Not sure that this particular bug is fixable, since our DOM fundamentally does not let us represent block content inside an <a> tag.

OTOH, it's interesting that we currently round trip:

[[佳能 EOS 300D|300D<p>Digital Rebel<p>Kiss Digital]]

to

[[佳能 EOS 300D|300D<p>Digital Rebel]]<p>[[佳能 EOS 300D|Kiss Digital]]</p>

ie, we managed to deal with the first <p> somehow. We might be able to recombine these tags in the html2wt phase.

Note that this is not true in production. The p tags in the first half are round-tripped with a meta tag based trick that is not safe when content can be edited. This trick is mainly used to hide noise in round-trip testing without selective serialization. parse.js defaulted to this trick so far, which I just submitted a patch for.

Also, selective serialization is thrown off by overlapping source ranges (it duplicates the nested paragraph source). Not sure if that can be improved on by forbidding range overlaps in the dsr pass.