Page MenuHomePhabricator

P-wrapping doesn't match php parser's $inBlockElem
Closed, ResolvedPublic

Description

[subbu@earth:~/work/wmf/parsoid] cat /tmp/wt 
{|
|-

{|
| x
|}

a

b

c
[subbu@earth:~/work/wmf/parsoid] parse.js --normalize=parsoid < /tmp/wt
<table>
<tbody>
<tr class="mw-empty-elt"></tr>
</tbody>
</table>
<table>
<tbody>
<tr>
<td>x</td>
</tr>
</tbody>
</table>
<p>a b c</p>

Unclosed tables are not that uncommon as a syntactical error category. So, we should probably try to fix this.

Event Timeline

ssastry created this task.May 16 2018, 3:16 AM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMay 16 2018, 3:16 AM
ssastry triaged this task as Normal priority.May 16 2018, 3:17 AM
ssastry updated the task description. (Show Details)
ssastry moved this task from Backlog to Read Views on the Parsoid board.
Arlolra claimed this task.May 23 2018, 10:02 PM

So, we should probably try to fix this.

By "this", I'm going to assume you mean the paragraph wrapping of the abc. The rest of the render looks right to me.

So, we should probably try to fix this.

By "this", I'm going to assume you mean the paragraph wrapping of the abc. The rest of the render looks right to me.

Yes, p-wrapping.

So, we should probably try to fix this.

By "this", I'm going to assume you mean the paragraph wrapping of the abc. The rest of the render looks right to me.

Yes, p-wrapping.

It has "pseudo-dom-info" via table stacks .. and obviously, that goes haywire with unclosed tables. So, unclear how this can actually be fixed in a reasonable manner ... besides moving p-wrapping to the DOM (which keeps popping up every once in a while).

Alternatively, we can throw up our hands and say this is broken wikitext and behavior is undefined => we won't try to make it always do the right thing. But, if possible, we should recover from it.

I'm going to see if the table stacks actually match up with the categories grouped in core,
https://github.com/wikimedia/mediawiki/commit/1f907d500a257a8af6b7d67072843a1ab4d3becf

Notes to self ...

Parsoid's p-wrapper seems to at least get the block on a line part of it right, but the table stacks / hasOpenHTMLPTag don't exactly mimic the php parser's $inBlockElem. For example,

<p>


hi



</p>

parses as,

<p class="mw-empty-elt" data-parsoid='{"stx":"html"'></p>
<p>
<br/>
hi</p>



<p></p>
Arlolra renamed this task from Edge case parsing bug to P-wrapping doesn't match php parser's $inBlockElem.May 29 2018, 8:31 PM

Change 436847 had a related patch set uploaded (by Arlolra; owner: Arlolra):
[mediawiki/services/parsoid@master] [WIP] Pare down some things in paragraph wrapping

https://gerrit.wikimedia.org/r/436847

Change 436847 merged by jenkins-bot:
[mediawiki/services/parsoid@master] Pare down some things in paragraph wrapping

https://gerrit.wikimedia.org/r/436847

Arlolra closed this task as Resolved.Jun 26 2018, 10:40 PM
Vvjjkkii renamed this task from P-wrapping doesn't match php parser's $inBlockElem to 1vcaaaaaaa.Jul 1 2018, 1:09 AM
Vvjjkkii reopened this task as Open.
Vvjjkkii removed Arlolra as the assignee of this task.
Vvjjkkii raised the priority of this task from Normal to High.
Vvjjkkii updated the task description. (Show Details)
Vvjjkkii edited subscribers, added: Arlolra; removed: gerritbot, Aklapper.
Community_Tech_bot renamed this task from 1vcaaaaaaa to P-wrapping doesn't match php parser's $inBlockElem.Jul 1 2018, 6:17 AM
Community_Tech_bot closed this task as Resolved.
Community_Tech_bot assigned this task to Arlolra.
Community_Tech_bot updated the task description. (Show Details)
Community_Tech_bot edited subscribers, added: gerritbot, Aklapper; removed: Arlolra.
CommunityTechBot lowered the priority of this task from High to Normal.Jul 3 2018, 3:27 AM