Page MenuHomePhabricator

Multiline tags in lists should be output more intelligently
Open, LowPublic

Description

Parsertest

Related with bug 5497,

*Some enumeration<div style="clear: both; color: red">
This text should be red
</div>

produces
<ul><li>Some enumeration<div style="clear: both; color: red">
</li></ul>
<p>This text should be red
</p>
</div>

"fixed" by tidy the wrong way:
<ul><li>Some enumeration<div style="clear: both; color: red"></div>
</li></ul>
<p>This text should be red
</p>

which affect several templates.


Version: unspecified
Severity: enhancement
See Also:
https://bugzilla.wikimedia.org/show_bug.cgi?id=1581

Attached:

Details

Reference
bz9996

Event Timeline

bzimport raised the priority of this task from to Low.Nov 21 2014, 9:40 PM
bzimport set Reference to bz9996.
bzimport added a subscriber: Unknown Object (MLST).

michaeldaly wrote:

Your first example is not legal HTML. The fix by Tidy is correct.

You cannot overlap tag starts and ends:

<tagA>
<tagB>
</tagB>
</tagA>

is legal.

<tagA>
<tagB>
</tagA>
</tagB>

is not legal

ayg wrote:

The illegal behavior was not being requested. See the parser test. It's questionable what behavior is acceptable for this period, given that attributes may include borders or floats or who knows what. The best solution is to not provide illegal markup in the first place.

Note that parser tests are run with Tidy off, and so parser tests for this are probably pointless. Furthermore, this is an upstream issue, so it should probably stay closed anyway.

MW output is not legal HTML. The tidy fix is wrong. Legal xhtml, but not the expected output.
Cases that tidy is able to fix are not-so-bad, but illegal tags that tidy screw are imho more important to fix in the parser (as tidy will stay).

Of course, what should be done is not providing illegal markup, and is the reason of the parser test.
My view is that when during a list, a block level element as <div is found in the same line, the list should be closed before outputtung the <div> i.e. what would do preg_replace("/(\*|#)(.*)(<div)/i", "\\1\\2\n\\3", $WikiText) on the beginning of parsing.

Any reasons to have lists of one-line <div>s?

brion added a comment.May 22 2007, 2:04 PM

The general issue is that wiki lists are line-based markup, so a <div> that spans multiple lines is not considered legal, and the results are undefined.

Combine that with the ugly multi-pass parser, and it doesn't always come out pretty. :)

ayg wrote:

Probably an ideal solution would be to have multiline tags not terminate the list item until the tag is terminated, as expected. I.e., it should produce <li><div>...</div></li>. This is probably not something anyone wants to implement with the current parser, however.

Phrased that way, this seems to be a duplicate of bug 1581, which is a special case (and the most important one). Probably makes most sense to dupe it to that.

Punting this to the new parser Brion has under development.

  • Bug 28691 has been marked as a duplicate of this bug. ***

*Bulk BZ Change: +Patch to open bugs with patches attached that are missing the keyword*

sumanah wrote:

+need-review

Reedy added a comment.Nov 20 2011, 5:34 PM

Killing both patch and need-review

It's a diff adding a parser test, which is to show the failure, although we could commit it now, but then we'd have failing parser tests showing up

So only wants committing when this bug is supposed to be fixed

  • Bug 33918 has been marked as a duplicate of this bug. ***

(In reply to Aryeh Gregor from comment #5)

Probably an ideal solution would be to have multiline tags not terminate the
list item until the tag is terminated, as expected.

This would also help bug 58429, which hits bug 1115 as well.

Danny_B removed a subscriber: wikibugs-l-list.
Izno added a subscriber: Izno.

Remex outputs the same as Tidy did.