Page MenuHomePhabricator

Navbox rendering incorrect, all items in the same line
Open, NormalPublic

Description

Parsoid output has broken output because Mediawiki:common.css is not included.

For example:
http://parsoid-lb.eqiad.wikimedia.org/frwiki/Guerre_froide?oldid=108085194

Here the original version:
https://fr.wikipedia.org/w/index.php?title=Guerre_froide&oldid=108085194

And with the visual editor:
https://fr.wikipedia.org/w/index.php?title=Guerre_froide&oldid=108085194&veaction=edit

Like you can see the infobox width is huge probably because there is no rule for the "infobox_v2" CSS class.


Version: unspecified
Severity: normal

Details

Reference
bz72416

Event Timeline

bzimport raised the priority of this task from to High.
bzimport set Reference to bz72416.
Kelson created this task.Oct 23 2014, 8:51 AM

Mediawiki:Common.css is included by the parsoid-rendered page, you can see this inspecting the infobox table with Firebug (or similar).

The rendering problem comes from the navboxes at the end, which have all the links in each cell in the 2nd column in the same line, instead of wrapping them. If (using Firebug) the .nowrap style (or just "white-space: nowrap;") is disabled, the page returns to a more appropriate width.

The only difference I can see in the HTML is the presence of multiple mw:Entity elements to indicate the white-space present in wikitext, so maybe that's what's interfering with nowrap.

I don't think it is mw:Entity entirely ... I am testing a html dump from that page where I replaced the mw:Entity with space or the entity as in the PHP html and it didn't change the rendering substantially. This requires some more digging into the HTML diffs between the two.

ssastry moved this task from Backlog to In Progress on the Parsoid board.Feb 3 2015, 6:00 PM
ssastry lowered the priority of this task from High to Normal.

Okay, I checked. There is a subtle but important difference that is causing this.

In the PHP output, between the individual links in that column, there is a single whitespace character.

... </span> <span class="nowrap"> ...

In Parsoid output, there is no whitespace between the individual links

...</span><span class="nowrap" data-parsoid='{"stx":"html"}'> ...

Without the whitespace, and with the nowrap class on the links, this means that there is no place for the browser to break the line, and hence that column stretches out forever. So, we need to figure out the reason why that single whitespace is being lost in Parsoid output.

I verified by manually adding that single space between the spans and the navbox table at the bottom renders properly. The infobox at the top has the same problem and is fixed if I manually added that single space between spans.

ssastry added a comment.EditedMay 29 2015, 11:29 PM

This is interesting.

Compare:
https://fr.wikipedia.org/w/api.php?action=parse&format=json&text={{Palette|Seconde%20Guerre%20mondiale|Guerre%20froide|Communisme|Histoire%20des%20%C3%89tats-Unis}}
vs.
https://fr.wikipedia.org/w/api.php?action=expandtemplates&format=json&text={{Palette|Seconde%20Guerre%20mondiale|Guerre%20froide|Communisme|Histoire%20des%20%C3%89tats-Unis}}

So, action=expandtemplates returns:

... </span>&#32;</span><span class="nowrap" ...

and action=parse returns:

... </span></span> <span class="nowrap" ...

How did the &#32; (which is a space char) move from inside the span to outside because of parsing? Parsoid is of course processing the result of action=expandtemplates and faitfully preserves the whitespace char where it exists .. inside the previous span. Something is goofy with the PHP parser there .. Or is it Tidy that is doing this?

The second part to this puzzle here is: even if the space char entity was *outside*, Parsoid would wrap that entity with a mw:Entity span which would also likely break the CSS rendering in this case.

@tstarling: any clue what is going on there with action=expandtemplates vs action=parse ? Can also dig into code next week.

The second part to this puzzle here is: even if the space char entity was *outside*, Parsoid would wrap that entity with a mw:Entity span which would also likely break the CSS rendering in this case.

So, I was wrong it looks like, thankfully. If I manually edited the HTML to:

</span><span typeof="mw:Entity"> </span><span class="nowrap"

the rendering is correct, of course, since the nowrap class doesn't apply to the entity span.

So, if we can figure out the action=parse vs action=expandtemplates difference, we might be on our way to solving this issue. We need to figure out whether one of those API endpoints has a bug and/or the template needs fixing.

Tidy decodes and moves the space:

> print MWTidy::tidy('<span class="a"><span class="b">x</span>&#32;</span>y');

<p><span class="a"><span class="b">x</span></span> y</p>

> $opt = new ParserOptions;

> print $wgParser->parse('<span class="a"><span class="b">x</span>&#32;</span>y', Title::newMainPage(), $opt)->getText();

<p><span class="a"><span class="b">x</span>&#32;</span>y
</p>

> $opt->setTidy(true);

> print $wgParser->parse('<span class="a"><span class="b">x</span>&#32;</span>y', Title::newMainPage(), $opt)->getText();

<p><span class="a"><span class="b">x</span></span> y</p>

We'll have to figure out what to do about this Tidy-caused discrepancy. Template edit to not depend on this Tidy effect might be the simplest. For now, moving this out of Q4.

ssastry moved this task from In Progress to Backlog on the Parsoid board.Jun 2 2015, 8:27 PM
ssastry set Security to None.
Pols12 added a subscriber: Pols12.May 18 2016, 5:34 PM
Pols12 added a comment.EditedMay 18 2016, 6:14 PM

Same issue occurs with <li> tags.
An example on fr.wiki: there are a <li> per line in HTML code however all <li> tags are displayed on the same line with VisualEditor (with Template:Portail)

Also, see https://www.mediawiki.org/wiki/Parsing/Replacing_Tidy .. especially Sections 2.2 and 2.5 on that page.

As part of T89331, we are working to replace Tidy with a proper HTML5 parser which will require wikitext fixes of the kind that @Od1n did above.

Izno added a subscriber: Izno.Jan 18 2017, 3:45 PM

I've just filed T155634: Tidy strips whitespace after HTML tags AND adds newlines between HTML tags which also has some spaces inserted (which may be relevant).