Malformed hlist rendering due to missing linebreaks between list items
Closed, ResolvedPublic

Description

As reported on
https://en.wikipedia.org/w/index.php?title=MediaWiki_talk:Common.css&oldid=508829963#Unexpected_behavior
for some reason, MediaWiki mess up the HTML code when the MediaWiki:Recentchangestext page is added to the top of the Special:RecentChanges, and this breaks some CSS formatting.

There should be a line break between "</li>" and "<li>".


Version: unspecified
Severity: normal
URL: https://test.wikipedia.org/w/index.php?diff=prev&oldid=140930&diffonly=1
See Also:
https://bugzilla.wikimedia.org/show_bug.cgi?id=35806
https://bugzilla.wikimedia.org/show_bug.cgi?id=56809

bzimport added a subscriber: Unknown Object (MLST).
bzimport set Reference to bz39617.
He7d3r created this task.Aug 24 2012, 1:41 PM

It is normal, that the use of the message shows other than the MediaWiki-page, because the message does not goes trougth HTML Tidy like all wiki pages.

Well, my local copy of MW has $wgUseTidy = true; and the problem is visible also in the "Hlist test" page in the main namespace. Therefore Tidy doesn't seems to be the culprit.

The 'problem' seems to be that Tidy *doesn't* insert linebreaks, because Special: pages don't go through Tidy. The .hlist CSS has been fixed to work around this problem, making it more universally applicable to boot. So closing this as this is not a MediaWiki problem.

...Unfortunately...

The CSS fix caused .hnum formatting to break. This has been fixed, with the ceveat that .hnum cannot be used in Special: pages.

Since Special: pages often contain payload from MediaWiki: space, including HTML, does that not warrant letting Tidy run it's course on Special: pages as well?

Or better yet... While </li><li> may not be strictly invalid, it looks rather sloppy. Since lists are block level elements, I feel it is better to have a \n after any </ol>, </ul>, </li>, </dl>, </dt> and </dd> in parser.php. That should be a simple change which would prevent a heap of problems.

Renamed the bug to indicate the cause and summarizing issue:

Horizontal lists are malformed on Special: pages because Tidy does not process these pages. Specifically, the lists lack a breakable character betweeen list items, causing the browsers to fail to wrap the list properly. While technically valid HTML, hlist depends on something breakable between list items.

I see two solutions:

  1. Have Tidy process any Special: pages that transclude any (part of) user editable pages (such as MediaWiki: messages)
  1. I have a patch ready for gerrit that makes parser.php terminate any closing list element with a linebreak.

Personally, I prefer the second option; it removes hlist's dependancy on Tidy (doing so may even lessen the load on Tidy in general). But I also feel that Tidy should process any (transcluded) user content, even on non-user generated pages.

dymsza wrote:

Hey, Could you post this patch i would like to apply it on my wiki.

Still learning Git. But the quick fix (should you want to do it manualy) is to find the following string in parser.php:

</li><li>

and change it to:

</li>\n<li>

My patch (once it's done) will also terminate all other closing tags with a linebreak, but the above should fix any immediate hlist problems.

kevinph wrote:

Thanks for this great solution Dokter! Just wanted to add a clarification for n00bs like myself that I had to change it from '</li><li>' to '</li>'."\n".'<li>' for it to parse correctly. The original suggestion just gave me \n in my actual text.

OK, that means my patch can go in the bin. It's not clear to me why '</li>\n<li>' would not work; it seems to be the standard way in PHP strings to include linebreaks. I wish someone would go in and terminate all closing HTML tags with a linebreak. This is such a trivial solution, but it goes beyond my PHP knowledge. ~~~~

Anomie added a comment.Sep 4 2013, 7:23 PM

(In reply to comment #10)

It's not clear to me why '</li>\n<li>' would not work

In PHP, escapes such as \n are only interpreted inside double-quoted strings. When you use single quotes, it outputs \n literally. So using double quotes, as in "</li>\n</li>", should work.

Working on a patch, but my local repository seems to be broken; I get over a year old files when rebasing. I do hate GIT with a passion.

Change 90696 had a related patch set uploaded by Gerrit Patch Uploader:
Have list items occupy their own line

https://gerrit.wikimedia.org/r/90696

Patch now ready for review. Changed parserTest.txt as well to reflect changed output. Thanks to MatmaRex for uploading patches.

Change 90696 merged by jenkins-bot:
Have list items occupy their own line

https://gerrit.wikimedia.org/r/90696

Change merged. It should be deployed to WMF wikis with 1.23wmf1 (unless we decide to have 1.22wmf23 instead), see https://www.mediawiki.org/wiki/MediaWiki_1.23/Roadmap for the schedule.

Setting the list items to white-space: inline-block (not inline) works without the newline in the DOM.

Support:

Omitting the newline from the DOM lets users choose whether to line break or not with CSS, so seems to be preferable. Inserting a newline sounds problematic for this reason, and might just be a tidy bug.

Examples:

// force single line, but space cannot be avoided as it is in the DOM
http://jsfiddle.net/dT3Qg/

// Without a space in the DOM, both single-line and break-on-li behavior are possible. Break-on-li with inline-block:
http://jsfiddle.net/dT3Qg/2/

Reopening this bug to discuss which solution makes sense in the longer term, both in tidy and Parsoid.

It would be great to find a CSS fix for this without relying on presence/absence of newlines if possible. This is especially so for Parsoid that is a bidirectional converter between wikitext and HTML. To minimize dirty diffs, Parsoid does a good deal of bookkeeping to accurately map wikitext segments to HTML segments that they generate. This newline behavior will add to the complexity of our book-keeping and we are really keen to avoid it, if possible.

Gabriel, this patch only sanitizes native parser HTML output so that Tidy doesn't have to, in order to have certain CSS (such as .hlist) work as expected in those cases where Tidy is *not* available. I don't know where your issues originate from, but if they are from regular wiki pages, then this should not affect them.

As for Parsoid, there is no CSS fix. Believe me, I've tried them all. This is the only solution left. List items are is essence block level elements and should be treated and generated as such. I'm not sure what Parsoid does exactly (VE?), but if it processes HTML output that went through Tidy, then this should also not affect Parsoid.

If NEW issues arise, please open a new bug.

cscott added a comment.Nov 8 2013, 9:54 PM

Erwin: did you try setting the CSS 'whitespace' property, as described in comment 17?

To be clear, this (merged) patch currently is causing Parsoid to create dirty diffs. We would very much like to revert it, and replace it with a CSS-only fix.

As discussed earlier, we should really use whitespace for this. Reopening.

Change 94443 had a related patch set uploaded by Cscott:
Revert "Have list items occupy their own line"

https://gerrit.wikimedia.org/r/94443

Relevant tidy bug for avoiding newline insertion: https://sourceforge.net/p/tidy/bugs/945/

"white-space: inline-block" is nonsense; it only causes white-space to revert to 'normal', which is NOT desired. Also, the posted Tidy bug hardly seems relevant.

The Tidy bug is what caused hlists to work in the first place, according to what you're saying :)

I am going to mark this resolved to avoid fragmenting discussion any further.
The talk about possibly reverting this change and CSS solutions should go on bug 56809, pretty please.

cscott added a comment.Nov 9 2013, 2:28 PM

Matma: I discussed this with gwicke and others. We'd like to use this bug to continue to deal with the hlist issues, since that's where most of the discussion has already occurred (and in our opinion that bug has not been fixed in an acceptable manner yet). The other bug is just to track the current parsertest regression, if patches are needed to Parsoid, ParserFunctions, Scribunto, etc.

I'm copying comment 10 from 56809 here -- Erwin says that:

https://test.wikipedia.org/wiki/MediaWiki:Common.css contain current CSS for
hlist.
https://test.wikipedia.org/wiki/User:Edokter/sandbox contains the test cases.

cscott added a comment.Nov 9 2013, 2:36 PM

And just recapping the objections to the current patch:

  1. it changes longstanding parser behavior, possibility of unintended consequences is high
  1. it breaks a number of extensions using the parsertests framework, including Parsoid and potentially others like ParserFunction, Scribunto, etc, by changing parser output in an unexpected way.
  1. as gwicke points out in comment 18, removing the linebreak means that stylesheets which *want* single-line behavior can't obtain it.
  1. It potentially causes dirty diff issues with Parsoid (see bug 56809 for details)

Some possible alternative fixes:

  1. I hope gwicke can demonstrate an improved CSS fix
  1. perhaps Tidy can/should be run for Special pages as well as normal pages.
  1. We suck it up and fix Parsoid, ParserFunction, Scribunto, etc to deal with the new output, evangelize editors etc to let them know the parser has changed, etc.

To be clear, we haven't reverted the existing patch yet. My position is just that it was committed prematurely, in view of the side effects, and that we need to take a harder look at the other alternatives. I apologize for not catching this issue earlier, before it was committed -- I usually watch Parser-related patches, but I happened to be on vacation.

He7d3r added a comment.Nov 9 2013, 2:39 PM

(In reply to comment #29)
And the other (important) part from bug 56809 comment 10 is:

Any suggested changes will break one or more testcases.

cscott added a comment.Nov 9 2013, 2:49 PM

That's why we write testcases, Helder. Fixes aren't committed until/unless they pass them. I thought that was obvious.

cscott added a comment.Nov 9 2013, 2:52 PM

In https://bugzilla.wikimedia.org/show_bug.cgi?id=56809#c14 Erwin writes:

That is a scoping problem; Tidy should simply not touch anything that comes out

of GeSHi (pre & code). This is simply two external extentions clashing.

This seems (to me) to indicate that this might actually be a Tidy integration bug: we need better control of how Tidy is run so that we can (a) run it on Special pages and other places where hlist isn't working, and (b) *not* run it on GeSHi and other extension-generated output. Moving tidy's invocation so that it happens before extensions are expanded might help solve (b). Thoughts? (esp from gwicke, since AIUI he hooked up Tidy originally.)

(I think thaat comment was about bug 260 and bug 38800.)

(It was. Tidy/GeSHi conflict is not about hlist. I got a collision when posting that.)

If extensions rely on "long standing parser output", it means the parser can never be changed or improved. I once got admonished by a bot operator because I dared to change the output of a template, throwing the (scraping) bot into a fit. I can't help but feel this is kind of the same situation.

Having the parser locked in because of extensions depending on its output is a Very Bad Thing. What happens if the parser is rewritten? It happened before. Is the Parsoid Team going to block that as well? Can Parsoind not be made line-break-agnostic in some way?

Having Tidy run on spacial pages *would* solve all hlist problems on the Wikimedia cluster. But it does mean that hlist integration in core is out, because Tidy would be a pre-requisite for other installations.

GWicke added a comment.Nov 9 2013, 7:23 PM

(In reply to comment #26)

"white-space: inline-block" is nonsense; it only causes white-space to revert
to 'normal', which is NOT desired. Also, the posted Tidy bug hardly seems
relevant.

Please actually look at the examples I posted.

Example code in http://jsfiddle.net/dT3Qg/2/ linked in #18:
<ul><li style="white-space: nowrap; display:

inline-block">foo foo foo foo foo foo foo foo foo foo foo foo foo foo foo

foo foo foo foo foo foo foo foo foo foo foo foo foo foo foo foo foo foo foo
foo foo foo foo foo foo foo foo foo foo foo foo foo foo foo foo foo </li><li
style="white-space: nowrap; display: inline-block">bar</li></ul>

GWicke added a comment.Nov 9 2013, 7:24 PM

(And apologies for the typo in #17 that probably threw you off.)

I have seen it. I have tested it in my sandbox on test.wikipeida.org. display+inline-block causes malformed lists; nested lists become unwilling to wrap, regargless of white-space setting.

I have tested everything, and this remains the only working solution.

I have created a simple nested list test case at https://test.wikipedia.org/wiki/User:GWicke/sandbox

The wrapping seems to work as expected, only the margins / padding still needs some tweaking. And the inline rules need to be converted into selectors.

Could you point out which issues you see, and maybe adjust / add a test case to capture those?

I'm sorry, but it does not wrap as expected. It wraps between foo and bar (beginning of sublist) where it should not. Inline-block somehow overrides white-space:normal, making it act as nowrap.

So you would like it *not* to wrap between a list and its first sub-list item, but then wrap between additional sub-list items?

I think I figured out what you meant by creating a sublist with a lot more list items. The break happened first between foo and bar rather than the nested list items, at least in some screen widths.

I have tweaked the css in https://test.wikipedia.org/wiki/User:GWicke/sandbox to make that work too. The change is to set display: inline on the nested list. Also removed the margin on it for uniform display.

Anything else that you see amiss?

Having the sublist use display:inline produces the same problem that caused me te submit this patch in the first place; the entire sublist won't wrap if the list items are glued together (no linebreak or TIDY).

gwicke's example linked in Comment 43 produces an odd display here:

Item Item Item Item Item Item Item Item Item Item
Item Item Item Item Item Item
Item

That last Item shouldn't be on a new line. And if you add another <li>item</li> before the one containing the sublist, it gets even worse:

Item
Item Item Item Item Item Item Item Item Item Item
Item Item Item Item Item Item
Item

[Bumping TM as MediaWiki 1.22.0 tarball was released today.]

The fix for this was merged into 1.22 and is still there. We're now discussing possibly reverting it and coming up with a better one. Let's keep the milestone, in case this gets RESOLVED FIXED again with no reverts.

Regarding hlist, the wrapping problem has been resolved by means of disabling nowrap alltogether. But it still has the potential of malfunctioning if nowrap is applied locally (ie. in a template).

With that in mind, I recommend that *this* bug be closed and discussion regarding parser/parsoid to continue in bug 56809.

Sorry for bringing this up this late, but can you describe the use case for rendering nested lists as a flat structure? Can those lists be a flat DOM structure instead?

No, hlist must work just like regular HTML lists because they must be sematically the same. Anything breaking that behaviour negates the entire purpose of hlist.

I don't think that this answers my question though. Why can't those lists just be flat if they are intended to render as a flat structure?

Accessibility. They only render flat visually, but its structure must be maintained to ensure they are accessable as regular HTML lists for screenreaders.

That points out why they should be a list, which we don't disagree on. Can you describe why they have to be *nested* lists?

That is entirely up to the user constructing the list. hlist only allows you to have nested lists, because HTML/wikimarkup allows you to have nested lists. I couldn't 'unnest' the lists even if I wanted to.

I don't see how this has become an issue. It was the nowrap issue that caused hlists to misbehave. The nowrap has been remove. What is the problem?

I'm still trying to figure out why you have the requirement to render *nested* lists as a flat structure.

So far it seems that the only reason to support this is that lists *can* generally be nested, not that they have to be for some reason. So right now it looks like we are actually trying to solve a problem that doesn't necessarily need to be solved.

The current solution (newline insertion) causes issues with round-tripping in Parsoid and restricts the way we can render things as described in comment 18. We could perhaps find a way to work around the issues in Parsoid, but this involves considerable effort and comes at a cost of an irregularity in the way we convert wikitext to DOM (which makes it harder to understand for our users).

So please let us know if there are real use cases for supporting nested lists in hlist, and why those are worth the costs.

The use cases are exactly the same as the use cases of nesting lists when just using HTML lists: anywhere you would potentially want to nest HTML lists is where you would also potentially want to nest hlists. Consider for example the following list:

-Cats
--Tabby
--Calico
-Dogs
--Chihuahua
--Great Dane
-Gerbils
-Kangaroos

This is a perfectly semantic structure, and flattening the lists into a single list destroys the semantics. If you convert it to an hlist (by adding an hlist div or the like around it), it displays instead as:

Cats (Tabby - Calico) - Dogs (Chihuahua - Great Dane) - Gerbils - Kangaroos

Again, this is perfectly semantic, and the nesting is still visually communicated, and again, flattening to just one list destroys the semantics. And I can tell you from personal experience, both on and off of Wikipedia/Wikimedia wikis, that it takes almost no imagination at all to find a myriad of applications for such nestings in navboxes. But as Erwin said, I don't see how lists being nested is at all relevant to this bug; the functionality is completely separate from the wrapping functionality, which, as he has stated multiple times, has already been removed from the hlist styles.

(In reply to Phillip Patriakeas from comment #56)

The use cases are exactly the same as the use cases of nesting lists when
just using HTML lists: anywhere you would potentially want to nest HTML
lists is where you would also potentially want to nest hlists. Consider for
example the following list:

-Cats
--Tabby
--Calico
-Dogs
--Chihuahua
--Great Dane
-Gerbils
-Kangaroos

This is a perfectly semantic structure, and flattening the lists into a
single list destroys the semantics. If you convert it to an hlist (by adding
an hlist div or the like around it), it displays instead as:

Cats (Tabby - Calico) - Dogs (Chihuahua - Great Dane) - Gerbils - Kangaroos

Again, this is perfectly semantic, and the nesting is still visually
communicated, and again, flattening to just one list destroys the semantics.

I see, thanks for the clear explanation!

And I can tell you from personal experience, both on and off of
Wikipedia/Wikimedia wikis, that it takes almost no imagination at all to
find a myriad of applications for such nestings in navboxes. But as Erwin
said, I don't see how lists being nested is at all relevant to this bug; the
functionality is completely separate from the wrapping functionality, which,
as he has stated multiple times, has already been removed from the hlist
styles.

Nowrap requirements made it hard to find a CSS-only solution. With that gone it might now actually be possible to do away with the newlines.

I copied over the .hlist styles from enwiki common.css to http://jsfiddle.net/dT3Qg/5/ and added a single line (display: inline) to the first rule to prevent a wrap before the nested list. For comparison, http://jsfiddle.net/dT3Qg/6/ is the unmodified CSS. As far as I can tell, both variants work with or without newlines in the DOM.

Erwin mentioned that nowrap could still be applied manually. Is this an important case we need to handle? Would that normally apply to a single list item?

Someone might still apply nowrap in separate list items, which *could* misbehave if the newlines were removed. That is why I do not like to see them removed. But that discussion should be continued in bug 56809.

Your jsfiddle examples are not used entirely correctly, as the hlist class always applies to the enclosing div, not directly to the ul tag. That is why you had to add the extra line.

With the wrapping issue resolved, I'm closing this bug.

Change 134380 had a related patch set uploaded by GWicke:
Update list item newline handling to follow Parsoid's model

https://gerrit.wikimedia.org/r/134380

Change 134380 merged by jenkins-bot:
Update list item newline handling to follow Parsoid's model

https://gerrit.wikimedia.org/r/134380

Change 94443 abandoned by Cscott:
Revert "Have list items occupy their own line".

Reason:
Abandoned in favor of Ib7aa9449bbd994cb23b83b3f23cff944b1cddadf

https://gerrit.wikimedia.org/r/94443

Removing target milestone that was in the past.

If you want this in a specific release, have a good reason AND you are willing to find resources to fix this bug, feel free to change it to something appropriate.

Thanks!

However, after I updated the code in
https://pt.wikipedia.org/w/index.php?diff=39195928&oldid=37160137
I still noticed differences between [[pt:MediaWiki:Recentchangestext‎]] and [[pt:Special:RecentChanges]]. Specifically, there was a space between the first item of each sublist and the "(" just before it. I had to make this change to fix the issue:
https://pt.wikipedia.org/w/index.php?diff=39195952

I don't know if that is is relevant enough to open a new bug.

This is also caused by (the absence of) Tidy. Unless the HTML parser (and Parsoid) strip leading and trailing spaces, they will be visible.

You may want to ping the parsoid team to see if they want to fix this, but at least you have found a workaround.

Add Comment