Malformed hlist rendering due to missing linebreaks between list items
Closed, ResolvedPublic

Description

As reported on
https://en.wikipedia.org/w/index.php?title=MediaWiki_talk:Common.css&oldid=508829963#Unexpected_behavior
for some reason, MediaWiki mess up the HTML code when the MediaWiki:Recentchangestext page is added to the top of the Special:RecentChanges, and this breaks some CSS formatting.

There should be a line break between "</li>" and "<li>".


Version: unspecified
Severity: normal
URL: https://test.wikipedia.org/w/index.php?diff=prev&oldid=140930&diffonly=1
See Also:
https://bugzilla.wikimedia.org/show_bug.cgi?id=35806
https://bugzilla.wikimedia.org/show_bug.cgi?id=56809

bzimport added projects: MediaWiki-Parser, Parser.Via ConduitNov 22 2014, 1:07 AM
bzimport added a subscriber: Unknown Object (MLST).
bzimport set Reference to bz39617.
He7d3r created this task.Via LegacyAug 24 2012, 1:41 PM
duplicatebug added a comment.Via ConduitAug 26 2012, 11:32 AM

It is normal, that the use of the message shows other than the MediaWiki-page, because the message does not goes trougth HTML Tidy like all wiki pages.

He7d3r added a comment.Via ConduitAug 26 2012, 12:36 PM

Well, my local copy of MW has $wgUseTidy = true; and the problem is visible also in the "Hlist test" page in the main namespace. Therefore Tidy doesn't seems to be the culprit.

Edokter added a comment.Via ConduitSep 2 2012, 1:10 PM

The 'problem' seems to be that Tidy *doesn't* insert linebreaks, because Special: pages don't go through Tidy. The .hlist CSS has been fixed to work around this problem, making it more universally applicable to boot. So closing this as this is not a MediaWiki problem.

Edokter added a comment.Via ConduitSep 3 2012, 10:03 PM

...Unfortunately...

The CSS fix caused .hnum formatting to break. This has been fixed, with the ceveat that .hnum cannot be used in Special: pages.

Since Special: pages often contain payload from MediaWiki: space, including HTML, does that not warrant letting Tidy run it's course on Special: pages as well?

Edokter added a comment.Via ConduitSep 4 2012, 8:20 PM

Or better yet... While </li><li> may not be strictly invalid, it looks rather sloppy. Since lists are block level elements, I feel it is better to have a \n after any </ol>, </ul>, </li>, </dl>, </dt> and </dd> in parser.php. That should be a simple change which would prevent a heap of problems.

Edokter added a comment.Via ConduitSep 30 2012, 9:33 PM

Renamed the bug to indicate the cause and summarizing issue:

Horizontal lists are malformed on Special: pages because Tidy does not process these pages. Specifically, the lists lack a breakable character betweeen list items, causing the browsers to fail to wrap the list properly. While technically valid HTML, hlist depends on something breakable between list items.

I see two solutions:

  1. Have Tidy process any Special: pages that transclude any (part of) user editable pages (such as MediaWiki: messages)
  1. I have a patch ready for gerrit that makes parser.php terminate any closing list element with a linebreak.

Personally, I prefer the second option; it removes hlist's dependancy on Tidy (doing so may even lessen the load on Tidy in general). But I also feel that Tidy should process any (transcluded) user content, even on non-user generated pages.

bzimport added a comment.Via ConduitDec 12 2012, 12:55 AM

dymsza wrote:

Hey, Could you post this patch i would like to apply it on my wiki.

Edokter added a comment.Via ConduitJan 14 2013, 8:18 PM

Still learning Git. But the quick fix (should you want to do it manualy) is to find the following string in parser.php:

</li><li>

and change it to:

</li>\n<li>

My patch (once it's done) will also terminate all other closing tags with a linebreak, but the above should fix any immediate hlist problems.

bzimport added a comment.Via ConduitJul 26 2013, 3:34 PM

kevinph wrote:

Thanks for this great solution Dokter! Just wanted to add a clarification for n00bs like myself that I had to change it from '</li><li>' to '</li>'."\n".'<li>' for it to parse correctly. The original suggestion just gave me \n in my actual text.

Edokter added a comment.Via ConduitJul 28 2013, 1:13 PM

OK, that means my patch can go in the bin. It's not clear to me why '</li>\n<li>' would not work; it seems to be the standard way in PHP strings to include linebreaks. I wish someone would go in and terminate all closing HTML tags with a linebreak. This is such a trivial solution, but it goes beyond my PHP knowledge. ~~~~

Anomie added a comment.Via ConduitSep 4 2013, 7:23 PM

(In reply to comment #10)

It's not clear to me why '</li>\n<li>' would not work

In PHP, escapes such as \n are only interpreted inside double-quoted strings. When you use single quotes, it outputs \n literally. So using double quotes, as in "</li>\n</li>", should work.

Edokter added a comment.Via ConduitOct 19 2013, 10:34 AM

Working on a patch, but my local repository seems to be broken; I get over a year old files when rebasing. I do hate GIT with a passion.

gerritbot added a comment.Via ConduitOct 19 2013, 2:03 PM

Change 90696 had a related patch set uploaded by Gerrit Patch Uploader:
Have list items occupy their own line

https://gerrit.wikimedia.org/r/90696

Edokter added a comment.Via ConduitOct 19 2013, 5:59 PM

Patch now ready for review. Changed parserTest.txt as well to reflect changed output. Thanks to MatmaRex for uploading patches.

gerritbot added a comment.Via ConduitOct 21 2013, 8:24 PM

Change 90696 merged by jenkins-bot:
Have list items occupy their own line

https://gerrit.wikimedia.org/r/90696

Anomie added a comment.Via ConduitOct 21 2013, 8:27 PM

Change merged. It should be deployed to WMF wikis with 1.23wmf1 (unless we decide to have 1.22wmf23 instead), see https://www.mediawiki.org/wiki/MediaWiki_1.23/Roadmap for the schedule.

GWicke added a comment.Via ConduitOct 21 2013, 10:30 PM

Setting the list items to white-space: inline-block (not inline) works without the newline in the DOM.

Support:

Omitting the newline from the DOM lets users choose whether to line break or not with CSS, so seems to be preferable. Inserting a newline sounds problematic for this reason, and might just be a tidy bug.

GWicke added a comment.Via ConduitOct 21 2013, 10:45 PM

Examples:

// force single line, but space cannot be avoided as it is in the DOM
http://jsfiddle.net/dT3Qg/

// Without a space in the DOM, both single-line and break-on-li behavior are possible. Break-on-li with inline-block:
http://jsfiddle.net/dT3Qg/2/

GWicke added a comment.Via ConduitOct 21 2013, 10:54 PM

Reopening this bug to discuss which solution makes sense in the longer term, both in tidy and Parsoid.

ssastry added a comment.Via ConduitOct 21 2013, 11:45 PM

It would be great to find a CSS fix for this without relying on presence/absence of newlines if possible. This is especially so for Parsoid that is a bidirectional converter between wikitext and HTML. To minimize dirty diffs, Parsoid does a good deal of bookkeeping to accurately map wikitext segments to HTML segments that they generate. This newline behavior will add to the complexity of our book-keeping and we are really keen to avoid it, if possible.

Edokter added a comment.Via ConduitOct 22 2013, 4:53 PM

Gabriel, this patch only sanitizes native parser HTML output so that Tidy doesn't have to, in order to have certain CSS (such as .hlist) work as expected in those cases where Tidy is *not* available. I don't know where your issues originate from, but if they are from regular wiki pages, then this should not affect them.

As for Parsoid, there is no CSS fix. Believe me, I've tried them all. This is the only solution left. List items are is essence block level elements and should be treated and generated as such. I'm not sure what Parsoid does exactly (VE?), but if it processes HTML output that went through Tidy, then this should also not affect Parsoid.

If NEW issues arise, please open a new bug.

cscott added a comment.Via ConduitNov 8 2013, 9:54 PM

Erwin: did you try setting the CSS 'whitespace' property, as described in comment 17?

To be clear, this (merged) patch currently is causing Parsoid to create dirty diffs. We would very much like to revert it, and replace it with a CSS-only fix.

GWicke added a comment.Via ConduitNov 8 2013, 10:01 PM

As discussed earlier, we should really use whitespace for this. Reopening.

gerritbot added a comment.Via ConduitNov 8 2013, 10:20 PM

Change 94443 had a related patch set uploaded by Cscott:
Revert "Have list items occupy their own line"

https://gerrit.wikimedia.org/r/94443

GWicke added a comment.Via ConduitNov 8 2013, 11:00 PM

Relevant tidy bug for avoiding newline insertion: https://sourceforge.net/p/tidy/bugs/945/

Edokter added a comment.Via ConduitNov 9 2013, 10:21 AM

"white-space: inline-block" is nonsense; it only causes white-space to revert to 'normal', which is NOT desired. Also, the posted Tidy bug hardly seems relevant.

matmarex added a comment.Via ConduitNov 9 2013, 12:29 PM

The Tidy bug is what caused hlists to work in the first place, according to what you're saying :)

matmarex added a comment.Via ConduitNov 9 2013, 12:30 PM

I am going to mark this resolved to avoid fragmenting discussion any further.
The talk about possibly reverting this change and CSS solutions should go on bug 56809, pretty please.

cscott added a comment.Via ConduitNov 9 2013, 2:28 PM

Matma: I discussed this with gwicke and others. We'd like to use this bug to continue to deal with the hlist issues, since that's where most of the discussion has already occurred (and in our opinion that bug has not been fixed in an acceptable manner yet). The other bug is just to track the current parsertest regression, if patches are needed to Parsoid, ParserFunctions, Scribunto, etc.

I'm copying comment 10 from 56809 here -- Erwin says that:

https://test.wikipedia.org/wiki/MediaWiki:Common.css contain current CSS for
hlist.
https://test.wikipedia.org/wiki/User:Edokter/sandbox contains the test cases.

cscott added a comment.Via ConduitNov 9 2013, 2:36 PM

And just recapping the objections to the current patch:

  1. it changes longstanding parser behavior, possibility of unintended consequences is high
  1. it breaks a number of extensions using the parsertests framework, including Parsoid and potentially others like ParserFunction, Scribunto, etc, by changing parser output in an unexpected way.
  1. as gwicke points out in comment 18, removing the linebreak means that stylesheets which *want* single-line behavior can't obtain it.
  1. It potentially causes dirty diff issues with Parsoid (see bug 56809 for details)

Some possible alternative fixes:

  1. I hope gwicke can demonstrate an improved CSS fix
  1. perhaps Tidy can/should be run for Special pages as well as normal pages.
  1. We suck it up and fix Parsoid, ParserFunction, Scribunto, etc to deal with the new output, evangelize editors etc to let them know the parser has changed, etc.

To be clear, we haven't reverted the existing patch yet. My position is just that it was committed prematurely, in view of the side effects, and that we need to take a harder look at the other alternatives. I apologize for not catching this issue earlier, before it was committed -- I usually watch Parser-related patches, but I happened to be on vacation.

He7d3r added a comment.Via ConduitNov 9 2013, 2:39 PM

(In reply to comment #29)
And the other (important) part from bug 56809 comment 10 is:

Any suggested changes will break one or more testcases.

cscott added a comment.Via ConduitNov 9 2013, 2:49 PM

That's why we write testcases, Helder. Fixes aren't committed until/unless they pass them. I thought that was obvious.

cscott added a comment.Via ConduitNov 9 2013, 2:52 PM

In https://bugzilla.wikimedia.org/show_bug.cgi?id=56809#c14 Erwin writes:

That is a scoping problem; Tidy should simply not touch anything that comes out

of GeSHi (pre & code). This is simply two external extentions clashing.

This seems (to me) to indicate that this might actually be a Tidy integration bug: we need better control of how Tidy is run so that we can (a) run it on Special pages and other places where hlist isn't working, and (b) *not* run it on GeSHi and other extension-generated output. Moving tidy's invocation so that it happens before extensions are expanded might help solve (b). Thoughts? (esp from gwicke, since AIUI he hooked up Tidy originally.)

matmarex added a comment.Via ConduitNov 9 2013, 2:57 PM

(I think thaat comment was about bug 260 and bug 38800.)

Edokter added a comment.Via ConduitNov 9 2013, 3:02 PM

(It was. Tidy/GeSHi conflict is not about hlist. I got a collision when posting that.)

Edokter added a comment.Via ConduitNov 9 2013, 3:17 PM

If extensions rely on "long standing parser output", it means the parser can never be changed or improved. I once got admonished by a bot operator because I dared to change the output of a template, throwing the (scraping) bot into a fit. I can't help but feel this is kind of the same situation.

Having the parser locked in because of extensions depending on its output is a Very Bad Thing. What happens if the parser is rewritten? It happened before. Is the Parsoid Team going to block that as well? Can Parsoind not be made line-break-agnostic in some way?

Having Tidy run on spacial pages *would* solve all hlist problems on the Wikimedia cluster. But it does mean that hlist integration in core is out, because Tidy would be a pre-requisite for other installations.

GWicke added a comment.Via ConduitNov 9 2013, 7:23 PM

(In reply to comment #26)

"white-space: inline-block" is nonsense; it only causes white-space to revert
to 'normal', which is NOT desired. Also, the posted Tidy bug hardly seems
relevant.

Please actually look at the examples I posted.

Example code in http://jsfiddle.net/dT3Qg/2/ linked in #18:
<ul><li style="white-space: nowrap; display:

inline-block">foo foo foo foo foo foo foo foo foo foo foo foo foo foo foo

foo foo foo foo foo foo foo foo foo foo foo foo foo foo foo foo foo foo foo
foo foo foo foo foo foo foo foo foo foo foo foo foo foo foo foo foo </li><li
style="white-space: nowrap; display: inline-block">bar</li></ul>

GWicke added a comment.Via ConduitNov 9 2013, 7:24 PM

(And apologies for the typo in #17 that probably threw you off.)

Edokter added a comment.Via ConduitNov 10 2013, 9:56 AM

I have seen it. I have tested it in my sandbox on test.wikipeida.org. display+inline-block causes malformed lists; nested lists become unwilling to wrap, regargless of white-space setting.

I have tested everything, and this remains the only working solution.

GWicke added a comment.Via ConduitNov 12 2013, 6:16 PM

I have created a simple nested list test case at https://test.wikipedia.org/wiki/User:GWicke/sandbox

The wrapping seems to work as expected, only the margins / padding still needs some tweaking. And the inline rules need to be converted into selectors.

Could you point out which issues you see, and maybe adjust / add a test case to capture those?

Edokter added a comment.Via ConduitNov 12 2013, 7:19 PM

I'm sorry, but it does not wrap as expected. It wraps between foo and bar (beginning of sublist) where it should not. Inline-block somehow overrides white-space:normal, making it act as nowrap.

GWicke added a comment.Via ConduitNov 14 2013, 2:13 AM

So you would like it *not* to wrap between a list and its first sub-list item, but then wrap between additional sub-list items?

GWicke added a comment.Via ConduitNov 14 2013, 4:45 AM

I think I figured out what you meant by creating a sublist with a lot more list items. The break happened first between foo and bar rather than the nested list items, at least in some screen widths.

I have tweaked the css in https://test.wikipedia.org/wiki/User:GWicke/sandbox to make that work too. The change is to set display: inline on the nested list. Also removed the margin on it for uniform display.

Anything else that you see amiss?

Edokter added a comment.Via ConduitNov 14 2013, 8:23 PM

Having the sublist use display:inline produces the same problem that caused me te submit this patch in the first place; the entire sublist won't wrap if the list items are glued together (no linebreak or TIDY).

Anomie added a comment.Via ConduitNov 14 2013, 9:38 PM

gwicke's example linked in Comment 43 produces an odd display here:

Item Item Item Item Item Item Item Item Item Item
Item Item Item Item Item Item
Item

That last Item shouldn't be on a new line. And if you add another <li>item</li> before the one containing the sublist, it gets even worse:

Item
Item Item Item Item Item Item Item Item Item Item
Item Item Item Item Item Item
Item

Aklapper added a comment.Via ConduitDec 7 2013, 2:37 AM

[Bumping TM as MediaWiki 1.22.0 tarball was released today.]

matmarex added a comment.Via ConduitDec 7 2013, 2:50 AM

The fix for this was merged into 1.22 and is still there. We're now discussing possibly reverting it and coming up with a better one. Let's keep the milestone, in case this gets RESOLVED FIXED again with no reverts.

Edokter added a comment.Via ConduitFeb 20 2014, 11:42 AM

Regarding hlist, the wrapping problem has been resolved by means of disabling nowrap alltogether. But it still has the potential of malfunctioning if nowrap is applied locally (ie. in a template).

With that in mind, I recommend that *this* bug be closed and discussion regarding parser/parsoid to continue in bug 56809.

GWicke added a comment.Via ConduitFeb 20 2014, 6:07 PM

Sorry for bringing this up this late, but can you describe the use case for rendering nested lists as a flat structure? Can those lists be a flat DOM structure instead?

Edokter added a comment.Via ConduitFeb 20 2014, 6:21 PM

No, hlist must work just like regular HTML lists because they must be sematically the same. Anything breaking that behaviour negates the entire purpose of hlist.

GWicke added a comment.Via ConduitFeb 20 2014, 6:35 PM

I don't think that this answers my question though. Why can't those lists just be flat if they are intended to render as a flat structure?

Edokter added a comment.Via ConduitFeb 20 2014, 7:12 PM

Accessibility. They only render flat visually, but its structure must be maintained to ensure they are accessable as regular HTML lists for screenreaders.

GWicke added a comment.Via ConduitFeb 20 2014, 7:16 PM

That points out why they should be a list, which we don't disagree on. Can you describe why they have to be *nested* lists?

Edokter added a comment.Via ConduitFeb 20 2014, 7:23 PM

That is entirely up to the user constructing the list. hlist only allows you to have nested lists, because HTML/wikimarkup allows you to have nested lists. I couldn't 'unnest' the lists even if I wanted to.

I don't see how this has become an issue. It was the nowrap issue that caused hlists to misbehave. The nowrap has been remove. What is the problem?

GWicke added a comment.Via ConduitFeb 20 2014, 7:41 PM

I'm still trying to figure out why you have the requirement to render *nested* lists as a flat structure.

So far it seems that the only reason to support this is that lists *can* generally be nested, not that they have to be for some reason. So right now it looks like we are actually trying to solve a problem that doesn't necessarily need to be solved.

The current solution (newline insertion) causes issues with round-tripping in Parsoid and restricts the way we can render things as described in comment 18. We could perhaps find a way to work around the issues in Parsoid, but this involves considerable effort and comes at a cost of an irregularity in the way we convert wikitext to DOM (which makes it harder to understand for our users).

So please let us know if there are real use cases for supporting nested lists in hlist, and why those are worth the costs.

Dinoguy1000 added a comment.Via ConduitFeb 20 2014, 8:29 PM

The use cases are exactly the same as the use cases of nesting lists when just using HTML lists: anywhere you would potentially want to nest HTML lists is where you would also potentially want to nest hlists. Consider for example the following list:

-Cats
--Tabby
--Calico
-Dogs
--Chihuahua
--Great Dane
-Gerbils
-Kangaroos

This is a perfectly semantic structure, and flattening the lists into a single list destroys the semantics. If you convert it to an hlist (by adding an hlist div or the like around it), it displays instead as:

Cats (Tabby - Calico) - Dogs (Chihuahua - Great Dane) - Gerbils - Kangaroos

Again, this is perfectly semantic, and the nesting is still visually communicated, and again, flattening to just one list destroys the semantics. And I can tell you from personal experience, both on and off of Wikipedia/Wikimedia wikis, that it takes almost no imagination at all to find a myriad of applications for such nestings in navboxes. But as Erwin said, I don't see how lists being nested is at all relevant to this bug; the functionality is completely separate from the wrapping functionality, which, as he has stated multiple times, has already been removed from the hlist styles.

GWicke added a comment.Via ConduitFeb 20 2014, 9:20 PM

(In reply to Phillip Patriakeas from comment #56)

The use cases are exactly the same as the use cases of nesting lists when
just using HTML lists: anywhere you would potentially want to nest HTML
lists is where you would also potentially want to nest hlists. Consider for
example the following list:

-Cats
--Tabby
--Calico
-Dogs
--Chihuahua
--Great Dane
-Gerbils
-Kangaroos

This is a perfectly semantic structure, and flattening the lists into a
single list destroys the semantics. If you convert it to an hlist (by adding
an hlist div or the like around it), it displays instead as:

Cats (Tabby - Calico) - Dogs (Chihuahua - Great Dane) - Gerbils - Kangaroos

Again, this is perfectly semantic, and the nesting is still visually
communicated, and again, flattening to just one list destroys the semantics.

I see, thanks for the clear explanation!

And I can tell you from personal experience, both on and off of
Wikipedia/Wikimedia wikis, that it takes almost no imagination at all to
find a myriad of applications for such nestings in navboxes. But as Erwin
said, I don't see how lists being nested is at all relevant to this bug; the
functionality is completely separate from the wrapping functionality, which,
as he has stated multiple times, has already been removed from the hlist
styles.

Nowrap requirements made it hard to find a CSS-only solution. With that gone it might now actually be possible to do away with the newlines.

I copied over the .hlist styles from enwiki common.css to http://jsfiddle.net/dT3Qg/5/ and added a single line (display: inline) to the first rule to prevent a wrap before the nested list. For comparison, http://jsfiddle.net/dT3Qg/6/ is the unmodified CSS. As far as I can tell, both variants work with or without newlines in the DOM.

Erwin mentioned that nowrap could still be applied manually. Is this an important case we need to handle? Would that normally apply to a single list item?

Edokter added a comment.Via ConduitFeb 20 2014, 10:35 PM

Someone might still apply nowrap in separate list items, which *could* misbehave if the newlines were removed. That is why I do not like to see them removed. But that discussion should be continued in bug 56809.

Your jsfiddle examples are not used entirely correctly, as the hlist class always applies to the enclosing div, not directly to the ul tag. That is why you had to add the extra line.

With the wrapping issue resolved, I'm closing this bug.

gerritbot added a comment.Via ConduitMay 20 2014, 4:22 PM

Change 134380 had a related patch set uploaded by GWicke:
Update list item newline handling to follow Parsoid's model

https://gerrit.wikimedia.org/r/134380

gerritbot added a comment.Via ConduitJun 9 2014, 6:13 PM

Change 134380 merged by jenkins-bot:
Update list item newline handling to follow Parsoid's model

https://gerrit.wikimedia.org/r/134380

gerritbot added a comment.Via ConduitJun 13 2014, 5:18 PM

Change 94443 abandoned by Cscott:
Revert "Have list items occupy their own line".

Reason:
Abandoned in favor of Ib7aa9449bbd994cb23b83b3f23cff944b1cddadf

https://gerrit.wikimedia.org/r/94443

MarkAHershberger added a comment.Via ConduitJun 21 2014, 7:57 PM

Removing target milestone that was in the past.

If you want this in a specific release, have a good reason AND you are willing to find resources to fix this bug, feel free to change it to something appropriate.

Edokter added a comment.Via ConduitJun 21 2014, 10:55 PM

The issue has been resolved with https://gerrit.wikimedia.org/r/134380.

He7d3r added a comment.Via ConduitJun 22 2014, 12:50 AM

Thanks!

However, after I updated the code in
https://pt.wikipedia.org/w/index.php?diff=39195928&oldid=37160137
I still noticed differences between [[pt:MediaWiki:Recentchangestext‎]] and [[pt:Special:RecentChanges]]. Specifically, there was a space between the first item of each sublist and the "(" just before it. I had to make this change to fix the issue:
https://pt.wikipedia.org/w/index.php?diff=39195952

I don't know if that is is relevant enough to open a new bug.

Edokter added a comment.Via ConduitJun 22 2014, 9:38 AM

This is also caused by (the absence of) Tidy. Unless the HTML parser (and Parsoid) strip leading and trailing spaces, they will be visible.

You may want to ping the parsoid team to see if they want to fix this, but at least you have found a workaround.

Add Comment