Unhandled <pre> tokenizing scenarios in tokenizer
Closed, ResolvedPublic

Description

See test case below. For some reason the <pre> inside the <blockquote> and the p-tag before the blockquote (all of those conditions are necessary to reproduce the bug) is causing the content after the blockquote to not be wrapped in p-tags. See output below. Probably some edge case in the paragraph-wrapping code.

Based on bug report here:
https://en.wikipedia.org/w/index.php?title=Wikipedia:VisualEditor/Feedback&oldid=575631753#VE_removing_paragraph_gaps

[subbu@earth lib] cat /tmp/x
a

<blockquote><pre>
b
</pre></blockquote>

c

d
[subbu@earth lib] node parse < /tmp/x
<body data-parsoid='{"dsr":[0,49,0,0]}'><p data-parsoid='{"dsr":[0,1,0,0]}'>a</p>

<blockquote data-parsoid='{"stx":"html","dsr":[3,42,12,13]}'><pre data-parsoid='{"stx":"html","autoInsertedEnd":true,"strippedNL":"\n","dsr":[15,29,5,0]}'>

b
&lt;/pre&gt;</pre></blockquote>

c

d
</body>


Version: unspecified
Severity: normal

bzimport added a project: Parsoid-Tokenizer.Via ConduitNov 22 2014, 2:38 AM
bzimport set Reference to bz54946.
ssastry created this task.Via LegacyOct 3 2013, 11:06 PM
ssastry added a comment.Via ConduitOct 3 2013, 11:10 PM

Changing that to "<blockquote><pre> b </pre></blockquote>" does not trigger the bug. So, p-tags before blockquote and HTML-pre in blockquote with content on new line puts the p-wrapper in a state where p-tags are not added.

ssastry added a comment.Via ConduitOct 4 2013, 4:34 PM

This is actually a tokenizer bug. The closing </pre> is not being recognized as an end-tag when a HTML <pre> follows another literal HTML tag on the same line.

Relevent snippet of output for: "a\n\n<span><pre>\nb\n</pre></span>"
...
<span data-parsoid='{"stx":"html","dsr":[3,28,5,6]}'><pre data-parsoid='{"stx":"html","autoInsertedEnd":true,"strippedNL":"\n","dsr":[8,22,5,0]}'>

b
&lt;/pre&gt;</pre></span>
...

gerritbot added a comment.Via ConduitOct 4 2013, 10:00 PM

Change 87632 had a related patch set uploaded by Subramanya Sastry:
(Bug 54946) Fix unhandle <pre> tokenizing scenarios

https://gerrit.wikimedia.org/r/87632

gerritbot added a comment.Via ConduitOct 29 2013, 12:42 AM

Change 92469 had a related patch set uploaded by GWicke:
WIP Bug 54946: Alternative solution for <pre> tokenization

https://gerrit.wikimedia.org/r/92469

gerritbot added a comment.Via ConduitNov 1 2013, 12:34 AM

Change 92469 merged by jenkins-bot:
Bug 54946: Alternative solution for <pre> tokenization

https://gerrit.wikimedia.org/r/92469

gerritbot added a comment.Via ConduitMay 23 2014, 10:23 PM

Change 87632 abandoned by Subramanya Sastry:
(Bug 54946) Fixed unhandled <pre> tokenizing scenarios

Reason:
Old and rusty and I am not going to look at this as I originally thought.

https://gerrit.wikimedia.org/r/87632

Add Comment

Column Prototype
This is a very early prototype of a persistent column. It is not expected to work yet, and leaving it open will activate other new features which will break things. Press "\" (backslash) on your keyboard to close it now.