Page MenuHomePhabricator

Investigate P-wrapping oddity that introduces long horizontal no-wrap lines on many navboxes on shwiki
Closed, ResolvedPublic

Description

Look at the following wikitext

<div><span>
x </span> <span>
y </span>
</div>

and the corresponding output

[subbu@earth parsoid] echo "<div><span>\nx </span> <span>\ny </span>\n</div>" | parse.js --normalize
<div><span>
<p>x <span> y </span></p>
</span></div>

So, in that Parsoid output, the p-wrapping is broken by the <span> being on the same line as the <div>. If the <span> happened to have a "white-space:nowrap;" style on it, it leads to ugliness like this: https://parsoid-vd-tests.wikimedia.org/visualdiff/pngs/shwiki/Gonfaron.parsoid.png

It turns out that the PHP parser emits similar output on that snippet as well:

vagrant@mediawiki-vagrant:/vagrant/mediawiki/maintenance$ mwscript parse.php --wiki=enwiki < /tmp/wt
parse.php: warning: reading wikitext from STDIN. Press CTRL+D to parse.

<div><span>
<p>x </span> <span>
y </span>
</p>
</div>

But, here is what happens when you tidy it!

vagrant@mediawiki-vagrant:/vagrant/mediawiki/maintenance$ mwscript parse.php --wiki=enwiki --tidy < /tmp/wt
parse.php: warning: reading wikitext from STDIN. Press CTRL+D to parse.

<div>
<p><span>x</span> <span>y</span></p>
</div>

Here are outputs with the various Tidy-replacement solutions:

RemexHTML:

vagrant@mediawiki-vagrant:/vagrant/mediawiki/maintenance$ mwscript parse.php --wiki=enwiki --tidy < /tmp/wt
parse.php: warning: reading wikitext from STDIN. Press CTRL+D to parse.

<div><span>
<p>x  <span>
y </span>
</p>
</span></div>

Balancer:

vagrant@mediawiki-vagrant:/vagrant/mediawiki/maintenance$ mwscript parse.php --wiki=enwiki --tidy < /tmp/wt
parse.php: warning: reading wikitext from STDIN. Press CTRL+D to parse.

<div><span>
<p>x  <span>
y </span>
</p>
</span></div>

Depurate:

vagrant@mediawiki-vagrant:/vagrant/mediawiki/maintenance$ mwscript parse.php --wiki=enwiki --tidy < /tmp/wt
parse.php: warning: reading wikitext from STDIN. Press CTRL+D to parse.

<div><span>
<p>x  <span>
y </span>
</p>
</span></div>

So, looks like this is a difference between Tidy and a HTML5-based parsing solution. The broader problem here seems to be the p-wrapping done partially on strings/tokens and partially on the DOM. With a purely DOM wrapping solution, there would not be any p-wrapper around any of the span content since it is inside a <div>. That would be okay since it wouldn't wrap a <span> around a paragraph by splitting it.

One temporary hack / solution would be to edit the template to emit a newline after the <div> but, we need to explore a better p-wrapping solution in the parsers.

Event Timeline

ssastry triaged this task as Medium priority.Mar 24 2017, 9:01 PM

This bug is very closely related to and is a special case of T134469.

ssastry renamed this task from Investigate P-wrapping oddity that introduces long horizontal no-wrap lines on many navboxes on shwiki to Investigate P-wrapping oddity that introduces long horizontal no-wrap lines on many navboxes on many wikis (shwiki, cebwiki, ...).Mar 27 2017, 6:04 PM
ssastry renamed this task from Investigate P-wrapping oddity that introduces long horizontal no-wrap lines on many navboxes on many wikis (shwiki, cebwiki, ...) to Investigate P-wrapping oddity that introduces long horizontal no-wrap lines on many navboxes on shwiki.

Change 347801 had a related patch set uploaded (by Subramanya Sastry):
[mediawiki/services/parsoid@master] WIP: Linter detection for T161306

https://gerrit.wikimedia.org/r/347801

Change 347801 merged by jenkins-bot:
[mediawiki/services/parsoid@master] Linter detection for T161306

https://gerrit.wikimedia.org/r/347801

This is now mostly covered by the pwrap-bug-workaround linter category and should let editors find these pages and templates easily. If there are any instances not covered by that linter category, we can update the detection code.

https://parsoid-vd-tests.wikimedia.org/visualdiff/pngs/shwiki/Gonfaron.parsoid.png is a non working link. I wanted to show people what the problem is, when contacting them about it.

https://parsoid-vd-tests.wikimedia.org/visualdiff/pngs/shwiki/Gonfaron.parsoid.png is a non working link. I wanted to show people what the problem is, when contacting them about it.

I refreshed the db since then with new titles.

But, https://parsoid-vd-tests.wikimedia.org/diff/shwiki/Balbins has links to different versions. You will see the same problem there.

https://parsoid-vd-tests.wikimedia.org/visualdiff/pngs/shwiki/Gonfaron.parsoid.png is a non working link. I wanted to show people what the problem is, when contacting them about it.

I refreshed the db since then with new titles.

But, https://parsoid-vd-tests.wikimedia.org/diff/shwiki/Balbins has links to different versions. You will see the same problem there.

https://sh.wikipedia.org/wiki/Balbins?action=parsermigration-edit and clicking on the collapsed navboxes on the bottom shows the same problem with Remex.

The Izer one doesn't do that: the Departmani Francuske does.

The Izer one doesn't do that: the Departmani Francuske does.

@ssastry is this just me? (Specifically, I see the issue in the top template only if I expand it after the bottom one is already expanded.)

The Izer one doesn't do that: the Departmani Francuske does.

@ssastry is this just me? (Specifically, I see the issue in the top template only if I expand it after the bottom one is already expanded.)

Right, the problem is with the bottom one, not the top one.

Per subbu elsewhere:
"shwiki is impacted by T161306 and it would be good to identify someone on that wiki to tackle that category -- it is very likely that adding a newline before the <span> in https://sh.wikipedia.org/w/index.php?title=%C5%A0ablon:nowrap%20begin&action=edit will fix most of the problems identified in the bug report"

ssastry claimed this task.