Investigate P-wrapping oddity that introduces long horizontal no-wrap lines on many navboxes on shwiki
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	ssastry
	Mar 24 2017, 1:58 PM

Description

Look at the following wikitext

<div><span>
x </span> <span>
y </span>
</div>

and the corresponding output

[subbu@earth parsoid] echo "<div><span>\nx </span> <span>\ny </span>\n</div>" | parse.js --normalize
<div><span>
<p>x <span> y </span></p>
</span></div>

So, in that Parsoid output, the p-wrapping is broken by the <span> being on the same line as the <div>. If the <span> happened to have a "white-space:nowrap;" style on it, it leads to ugliness like this: https://parsoid-vd-tests.wikimedia.org/visualdiff/pngs/shwiki/Gonfaron.parsoid.png

It turns out that the PHP parser emits similar output on that snippet as well:

vagrant@mediawiki-vagrant:/vagrant/mediawiki/maintenance$ mwscript parse.php --wiki=enwiki < /tmp/wt
parse.php: warning: reading wikitext from STDIN. Press CTRL+D to parse.

<div><span>
<p>x </span> <span>
y </span>
</p>
</div>

But, here is what happens when you tidy it!

vagrant@mediawiki-vagrant:/vagrant/mediawiki/maintenance$ mwscript parse.php --wiki=enwiki --tidy < /tmp/wt
parse.php: warning: reading wikitext from STDIN. Press CTRL+D to parse.

<div>
<p><span>x</span> <span>y</span></p>
</div>

Here are outputs with the various Tidy-replacement solutions:

RemexHTML:

vagrant@mediawiki-vagrant:/vagrant/mediawiki/maintenance$ mwscript parse.php --wiki=enwiki --tidy < /tmp/wt
parse.php: warning: reading wikitext from STDIN. Press CTRL+D to parse.

<div><span>
<p>x  <span>
y </span>
</p>
</span></div>

Balancer:

vagrant@mediawiki-vagrant:/vagrant/mediawiki/maintenance$ mwscript parse.php --wiki=enwiki --tidy < /tmp/wt
parse.php: warning: reading wikitext from STDIN. Press CTRL+D to parse.

<div><span>
<p>x  <span>
y </span>
</p>
</span></div>

Depurate:

vagrant@mediawiki-vagrant:/vagrant/mediawiki/maintenance$ mwscript parse.php --wiki=enwiki --tidy < /tmp/wt
parse.php: warning: reading wikitext from STDIN. Press CTRL+D to parse.

<div><span>
<p>x  <span>
y </span>
</p>
</span></div>

So, looks like this is a difference between Tidy and a HTML5-based parsing solution. The broader problem here seems to be the p-wrapping done partially on strings/tokens and partially on the DOM. With a purely DOM wrapping solution, there would not be any p-wrapper around any of the span content since it is inside a <div>. That would be okay since it wouldn't wrap a <span> around a paragraph by splitting it.

One temporary hack / solution would be to edit the template to emit a newline after the <div> but, we need to explore a better p-wrapping solution in the parsers.

Related Objects

Mentioned In: T178838: Switch to Remex for de.wp, it.wp, and 170 smaller wikis (at least)
T162919: Disable fostering lint error for category and other transparent tags
T154709: Parsoid does not emit different HTML when the page=# property is set on paged media (PDFs/DjVus/TIFFs)
T89262: Read thumb sizes from siteinfo
T163330: Error in logs: Cannot read property '0' of undefined
T153885: Parsoid doesn't handle templated template names yet
Mentioned Here: rGPAR55b905111b3e: Fix crasher from a9cfa9c9
T89262: Read thumb sizes from siteinfo
T153885: Parsoid doesn't handle templated template names yet
T154709: Parsoid does not emit different HTML when the page=# property is set on paged media (PDFs/DjVus/TIFFs)
T162919: Disable fostering lint error for category and other transparent tags
T163330: Error in logs: Cannot read property '0' of undefined
T134469: doBlockLevels() inserts <p> and </p> randomly with no regard for HTML validity

Event Timeline

ssastry created this task.Mar 24 2017, 1:58 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMar 24 2017, 1:58 PM

ssastry updated the task description. (Show Details)Mar 24 2017, 2:03 PM

ssastry updated the task description. (Show Details)Mar 24 2017, 2:25 PM

ssastry triaged this task as Medium priority.Mar 24 2017, 9:01 PM

This bug is very closely related to and is a special case of T134469.

ssastry renamed this task from Investigate P-wrapping oddity that introduces long horizontal no-wrap lines on many navboxes on shwiki to Investigate P-wrapping oddity that introduces long horizontal no-wrap lines on many navboxes on many wikis (shwiki, cebwiki, ...).Mar 27 2017, 6:04 PM

ssastry renamed this task from Investigate P-wrapping oddity that introduces long horizontal no-wrap lines on many navboxes on many wikis (shwiki, cebwiki, ...) to Investigate P-wrapping oddity that introduces long horizontal no-wrap lines on many navboxes on shwiki.

Change 347801 had a related patch set uploaded (by Subramanya Sastry):
[mediawiki/services/parsoid@master] WIP: Linter detection for T161306

https://gerrit.wikimedia.org/r/347801

gerritbot added a project: Patch-For-Review.Apr 12 2017, 3:05 AM

Snaevar subscribed.Apr 15 2017, 2:41 PM

Change 347801 merged by jenkins-bot:
[mediawiki/services/parsoid@master] Linter detection for T161306

https://gerrit.wikimedia.org/r/347801

Mentioned in SAL (#wikimedia-operations) [2017-04-25T17:25:33Z] <arlolra> Updated Parsoid to rGPAR55b905111b3e (T153885, T163330, T89262, T154709, T162919, T161306)

This is now mostly covered by the pwrap-bug-workaround linter category and should let editors find these pages and templates easily. If there are any instances not covered by that linter category, we can update the detection code.

ssastry moved this task from Needs Triage to In Progress on the Parsoid board.May 2 2017, 10:39 PM

ssastry mentioned this in T178838: Switch to Remex for de.wp, it.wp, and 170 smaller wikis (at least).Oct 23 2017, 8:23 PM

• Elitre subscribed.Oct 27 2017, 9:27 AM

https://parsoid-vd-tests.wikimedia.org/visualdiff/pngs/shwiki/Gonfaron.parsoid.png is a non working link. I wanted to show people what the problem is, when contacting them about it.

In T161306#3763150, @Elitre wrote:

https://parsoid-vd-tests.wikimedia.org/visualdiff/pngs/shwiki/Gonfaron.parsoid.png is a non working link. I wanted to show people what the problem is, when contacting them about it.

I refreshed the db since then with new titles.

But, https://parsoid-vd-tests.wikimedia.org/diff/shwiki/Balbins has links to different versions. You will see the same problem there.

In T161306#3763159, @ssastry wrote:

In T161306#3763150, @Elitre wrote:

https://parsoid-vd-tests.wikimedia.org/visualdiff/pngs/shwiki/Gonfaron.parsoid.png is a non working link. I wanted to show people what the problem is, when contacting them about it.

I refreshed the db since then with new titles.

But, https://parsoid-vd-tests.wikimedia.org/diff/shwiki/Balbins has links to different versions. You will see the same problem there.

https://sh.wikipedia.org/wiki/Balbins?action=parsermigration-edit and clicking on the collapsed navboxes on the bottom shows the same problem with Remex.

The Izer one doesn't do that: the Departmani Francuske does.

In T161306#3763370, @Elitre wrote:

The Izer one doesn't do that: the Departmani Francuske does.

@ssastry is this just me? (Specifically, I see the issue in the top template only if I expand it after the bottom one is already expanded.)

In T161306#3775311, @Elitre wrote:

In T161306#3763370, @Elitre wrote:

The Izer one doesn't do that: the Departmani Francuske does.

@ssastry is this just me? (Specifically, I see the issue in the top template only if I expand it after the bottom one is already expanded.)

Right, the problem is with the bottom one, not the top one.

Per subbu elsewhere:
"shwiki is impacted by T161306 and it would be good to identify someone on that wiki to tackle that category -- it is very likely that adding a newline before the <span> in https://sh.wikipedia.org/w/index.php?title=%C5%A0ablon:nowrap%20begin&action=edit will fix most of the problems identified in the bug report"

And here's my message to the wiki: https://sh.wikipedia.org/wiki/Wikipedia:Pijaca-Пијаца#A_fix_required_at_this_wiki

ssastry moved this task from In Progress to Needs Triage on the Parsoid board.Dec 18 2017, 9:35 PM

ssastry closed this task as Resolved.Jan 11 2018, 9:14 PM

ssastry claimed this task.

Investigate P-wrapping oddity that introduces long horizontal no-wrap lines on many navboxes on shwikiClosed, ResolvedPublicActions

Description

Related Objects

Event Timeline

Investigate P-wrapping oddity that introduces long horizontal no-wrap lines on many navboxes on shwiki
Closed, ResolvedPublic
Actions