Block element written inline splits multiline paragraphs
OpenPublic

Assigned To
None
Priority
Low
Author
bzimport
Subscribers
He7d3r, MarkAHershberger, cscott and 3 others
Projects
Reference
bz5718
Security
None
Description

Author: omniplex

Description:
<pre>-formatted text at the end of a multiline (source)
paragraph splits the leading text of the last line
before the <pre> into a separate paragraph.

In other words the last line of the source paragraph...
111 foo 111
222 bar 222
333 xyz <pre>abc</pre>
...is rendered as 222</p><p>333 xyz</p><pre>abc</pre>...
instead of 222 333 xyz </p><pre>abc</pre>...

For a demo see w:User:Omniplex/5569b (page deleted) - this was a part of the <pre>-observations in T7569, but it's most probably unrelated to this bug.


Version: 1.16.x
Severity: normal
URL: http://test.wikipedia.org/wiki/Bug_5718
See Also:
T8200: Linebreaks are mishandled in <blockquote> and <li>

bzimport set Reference to bz5718.
bzimport created this task.Via LegacyApr 26 2006, 1:36 AM
bzimport added a comment.Via ConduitApr 26 2006, 12:49 PM

gangleri wrote:

(In reply to comment #0)

... For a demo see [[w:User:Omniplex/5569b]] ...

should read [[User:Omniplex/5569b]]

see also [[rmy:MediaWiki:Monobook.css]]

hashar added a comment.Via ConduitApr 28 2006, 6:28 PM

When parsing '333 xyz <pre>abc</pre>' the parser is in a paragraph
and close it because the line contain a <pre>.

The bug in Parser.php:

if ($this->mLastSection != 'pre') {

$paragraphStack = false;
$output .= $this->closeParagraph().'<pre>';
$this->mLastSection = 'pre';

}

So we have to look at mLastSection extract text in it wich is before '<pre>',
output that text, closeParagraph.

I will try to patch it :)

hashar added a comment.Via ConduitApr 28 2006, 6:47 PM

above code is not really where the bug occurs :p

hashar added a comment.Via ConduitApr 28 2006, 6:52 PM

The bug is appearing whenever we have a tag in a line.
$openmatch / $closematch show the full list of tag.

Another example:

foo bar toto
azoueaz azoeaz
super foo<h2>tooto title</h2>

Will insert a break line (closeParagraph call before the <h2>):

foo bar toto azoueaz azoeaz
super foo
TOOTO TITLE

bzimport added a comment.Via ConduitMay 28 2007, 12:45 PM

pyrios wrote:

I am interested in looking at this.

bzimport added a comment.Via ConduitJun 5 2007, 3:58 AM

pyrios wrote:

Split line which contains end of paragraph and start of a block element

This seems to work for the simple test case where there is the ending part of a paragraph followed by a single starting block tag. I also need to verify that no regressions have been introduced.

attachment bug5718.patch ignored as obsolete

Mormegil added a comment.Via ConduitMar 20 2008, 11:29 PM

Changing summary so that it better suits the more general problem, updating version (this is still a problem in current SVN), changing URL to an existing testcase.

Mormegil added a comment.Via ConduitMar 20 2008, 11:29 PM
  • Bug 8037 has been marked as a duplicate of this bug. ***
Mormegil added a comment.Via ConduitMar 20 2008, 11:58 PM

Created attachment 4739
Small improvement of the previous patch

The previous version of the patch searched for the first < on the line, which might not be the block level element we are breaking at.

Why not just capture the offset from the previous preg_match that found the element?

attachment Parser.php.diff ignored as obsolete

Mormegil added a comment.Via ConduitJul 17 2008, 8:36 AM
  • Bug 14489 has been marked as a duplicate of this bug. ***
bzimport added a comment.Via ConduitDec 7 2009, 2:35 PM

chinchi29 wrote:

Not fixed yet. Nothing since 2008-07-17.

111 foo 111
222 bar 222
333 xyz <pre>abc</pre>

Still renders as:

<p><br />
111 foo 111 222 bar 222</p>
<p>333 xyz</p>
<pre>
abc
</pre>

instead of:

<p>111 foo 111 222 bar 222 333 xyz</p>
<pre>
abc
</pre>

bzimport added a comment.Via ConduitDec 9 2009, 12:54 AM

pyrios wrote:

I will take another look.

bzimport added a comment.Via ConduitDec 29 2009, 2:17 AM

pyrios wrote:

Refresh of previous patch

attachment fix5718.patch ignored as obsolete

bzimport added a comment.Via ConduitDec 29 2009, 2:21 AM

pyrios wrote:

The last patch was stale, but the basic idea was the right way to fix this bug.
I cleaned it up a little and added it in manually.

The fix as it is, breaks 3 previously passing test cases. One of them is
whitespace only. The other two are structural, but the html still looks ok.

I will update the first test case if there are no objections.

The latter two involve paragraphs inside a <div>, where there is a linebreak
after the <div>. I don't totally understand the test cases. Based on the
other div tests, there will be no <p> created if the text immediately follows
the <div> tag, but if there is a linebreak there will be a <p> created, *but*
only for the first paragraph. It seems like all should be broken into
paragraphs or none should be broken into paragraphs.

Added current patch and output from relevant test cases if someone could take a
look.

Thanks!

bzimport added a comment.Via ConduitDec 29 2009, 2:23 AM

pyrios wrote:

Adds testcases mentioned in this bug to parserTests.txt

Attached: testcase5718.txt

bzimport added a comment.Via ConduitDec 29 2009, 2:25 AM

pyrios wrote:

Output from test failure.

Attached: parser_test_out.txt

bzimport added a comment.Via ConduitDec 30 2009, 4:19 AM

pyrios wrote:

fix bug and keep old behavior for paragraphs inside div

Added logic to handle div as a special case to keep the old behavior. I don't think that is correct, but the new parse was not correct either. Div processing has some subtle issues that need to be changed/fixed in outside of this bug.

Attached: fix5718.patch.2

bzimport added a comment.Via ConduitDec 30 2009, 4:29 AM

pyrios wrote:

Changing to fixed. If someone with commit access can apply this patch, that would be great.

bzimport added a comment.Via ConduitDec 30 2009, 5:03 AM

chinchi29 wrote:

Isn't fixed until reviewed and committed!

bzimport added a comment.Via ConduitDec 30 2009, 4:09 PM

pyrios wrote:

That makes sense. In the future, how do I signal that a patch is ready for review?

Platonides added a comment.Via ConduitDec 30 2009, 4:13 PM

The need-review keyword does it. Bugging people also helps ;)

MarkAHershberger added a comment.Via ConduitApr 12 2011, 4:06 PM

Punting this to the new parser Brion has under development.

bzimport added a comment.Via ConduitNov 10 2011, 3:30 PM

wicke wrote:

George, thank you for your patch! It seems to mostly do what it says, but I am quite worried about block-level elements other than div. Those would also need to be treated specially to avoid introducing regressions.

In general, we are currently trying to solve the problems inherent in the current parser architecture more systematically by writing a completely new parser (https://www.mediawiki.org/wiki/Future ). The problem you solved here should be fixed in this new parser, as we are no longer using regexp-based heuristics for block-level formatting.

bzimport added a comment.Via ConduitJan 23 2012, 7:45 PM

sumanah wrote:

(In reply to comment #23)
Marking this reviewed per Gabriel's comments.

bzimport added a comment.Via ConduitOct 20 2014, 1:21 AM

smccandlish wrote:

I've updated http://test.wikipedia.org/wiki/Bug_5718 with additional test cases. They show extra weirdness.

The three cases:

<div>foo</div>

and

<div>foo
</div>

and

<div>
foo</div>

are all sent to the browser as:

<div>foo</div>

Only:

<div>
bar
</div>

is rendered as:

<div>
<p>bar</p>
</div>

It gets even weirder when the content is multiple lines.

Qgil edited the task description. (Show Details)Via WebDec 13 2014, 9:54 PM
Qgil set Security to None.
Qgil added a subscriber: Qgil.EditedVia WebJan 10 2015, 10:00 PM

@cscott, this is one of the oldest tasks assigned. Are you planning to work on it?

Qgil placed this task up for grabs.Via WebFeb 14 2015, 3:27 PM

Add Comment

Column Prototype
This is a very early prototype of a persistent column. It is not expected to work yet, and leaving it open will activate other new features which will break things. Press "\" (backslash) on your keyboard to close it now.