Block element written inline splits multiline paragraphs
Open, LowPublic

Description

Author: omniplex

Description:
<pre>-formatted text at the end of a multiline (source)
paragraph splits the leading text of the last line
before the <pre> into a separate paragraph.

In other words the last line of the source paragraph...
111 foo 111
222 bar 222
333 xyz <pre>abc</pre>
...is rendered as 222</p><p>333 xyz</p><pre>abc</pre>...
instead of 222 333 xyz </p><pre>abc</pre>...

For a demo see w:User:Omniplex/5569b (page deleted) - this was a part of the <pre>-observations in T7569, but it's most probably unrelated to this bug.


Version: 1.16.x
Severity: normal
URL: http://test.wikipedia.org/wiki/Bug_5718
See Also:
T8200: Linebreaks are mishandled in <blockquote> and <li>

Details

Security
None
Reference
bz5718
bzimport set Reference to bz5718.
bzimport created this task.Apr 26 2006, 1:36 AM

gangleri wrote:

(In reply to comment #0)

... For a demo see [[w:User:Omniplex/5569b]] ...

should read [[User:Omniplex/5569b]]

see also [[rmy:MediaWiki:Monobook.css]]

When parsing '333 xyz <pre>abc</pre>' the parser is in a paragraph
and close it because the line contain a <pre>.

The bug in Parser.php:

if ($this->mLastSection != 'pre') {

$paragraphStack = false;
$output .= $this->closeParagraph().'<pre>';
$this->mLastSection = 'pre';

}

So we have to look at mLastSection extract text in it wich is before '<pre>',
output that text, closeParagraph.

I will try to patch it :)

above code is not really where the bug occurs :p

The bug is appearing whenever we have a tag in a line.
$openmatch / $closematch show the full list of tag.

Another example:

foo bar toto
azoueaz azoeaz
super foo<h2>tooto title</h2>

Will insert a break line (closeParagraph call before the <h2>):

foo bar toto azoueaz azoeaz
super foo
TOOTO TITLE

pyrios wrote:

I am interested in looking at this.

pyrios wrote:

Split line which contains end of paragraph and start of a block element

This seems to work for the simple test case where there is the ending part of a paragraph followed by a single starting block tag. I also need to verify that no regressions have been introduced.

attachment bug5718.patch ignored as obsolete

Changing summary so that it better suits the more general problem, updating version (this is still a problem in current SVN), changing URL to an existing testcase.

  • Bug 8037 has been marked as a duplicate of this bug. ***

Created attachment 4739
Small improvement of the previous patch

The previous version of the patch searched for the first < on the line, which might not be the block level element we are breaking at.

Why not just capture the offset from the previous preg_match that found the element?

attachment Parser.php.diff ignored as obsolete

  • Bug 14489 has been marked as a duplicate of this bug. ***

chinchi29 wrote:

Not fixed yet. Nothing since 2008-07-17.

111 foo 111
222 bar 222
333 xyz <pre>abc</pre>

Still renders as:

<p><br />
111 foo 111 222 bar 222</p>
<p>333 xyz</p>
<pre>
abc
</pre>

instead of:

<p>111 foo 111 222 bar 222 333 xyz</p>
<pre>
abc
</pre>

pyrios wrote:

I will take another look.

pyrios wrote:

Refresh of previous patch

attachment fix5718.patch ignored as obsolete

pyrios wrote:

The last patch was stale, but the basic idea was the right way to fix this bug.
I cleaned it up a little and added it in manually.

The fix as it is, breaks 3 previously passing test cases. One of them is
whitespace only. The other two are structural, but the html still looks ok.

I will update the first test case if there are no objections.

The latter two involve paragraphs inside a <div>, where there is a linebreak
after the <div>. I don't totally understand the test cases. Based on the
other div tests, there will be no <p> created if the text immediately follows
the <div> tag, but if there is a linebreak there will be a <p> created, *but*
only for the first paragraph. It seems like all should be broken into
paragraphs or none should be broken into paragraphs.

Added current patch and output from relevant test cases if someone could take a
look.

Thanks!

pyrios wrote:

Adds testcases mentioned in this bug to parserTests.txt

Attached: testcase5718.txt

pyrios wrote:

Output from test failure.

Attached: parser_test_out.txt

pyrios wrote:

fix bug and keep old behavior for paragraphs inside div

Added logic to handle div as a special case to keep the old behavior. I don't think that is correct, but the new parse was not correct either. Div processing has some subtle issues that need to be changed/fixed in outside of this bug.

Attached: fix5718.patch.2

pyrios wrote:

Changing to fixed. If someone with commit access can apply this patch, that would be great.

chinchi29 wrote:

Isn't fixed until reviewed and committed!

pyrios wrote:

That makes sense. In the future, how do I signal that a patch is ready for review?

The need-review keyword does it. Bugging people also helps ;)

Punting this to the new parser Brion has under development.

wicke wrote:

George, thank you for your patch! It seems to mostly do what it says, but I am quite worried about block-level elements other than div. Those would also need to be treated specially to avoid introducing regressions.

In general, we are currently trying to solve the problems inherent in the current parser architecture more systematically by writing a completely new parser (https://www.mediawiki.org/wiki/Future ). The problem you solved here should be fixed in this new parser, as we are no longer using regexp-based heuristics for block-level formatting.

sumanah wrote:

(In reply to comment #23)
Marking this reviewed per Gabriel's comments.

smccandlish wrote:

I've updated http://test.wikipedia.org/wiki/Bug_5718 with additional test cases. They show extra weirdness.

The three cases:

<div>foo</div>

and

<div>foo
</div>

and

<div>
foo</div>

are all sent to the browser as:

<div>foo</div>

Only:

<div>
bar
</div>

is rendered as:

<div>
<p>bar</p>
</div>

It gets even weirder when the content is multiple lines.

Qgil edited the task description. (Show Details)Dec 13 2014, 9:54 PM
Qgil set Security to None.
Qgil added a subscriber: Qgil.EditedJan 10 2015, 10:00 PM

@cscott, this is one of the oldest tasks assigned. Are you planning to work on it?

Qgil removed cscott as the assignee of this task.Feb 14 2015, 3:27 PM

Add Comment