Page MenuHomePhabricator

Parsoid should not introduce a stray <br/> tag in wikitext just to preserve all visual newlines (<p></p>) introduced by VE
Closed, ResolvedPublic

Description

Parsoid generates <br> tags in wikitext out of nowhere sometimes.

Event Timeline

That description is not very helpful for us to investigate this :-). Perhaps submit diffs OR test cases OR ways to reproduce?

That description is not very helpful for us to investigate this :-). Perhaps submit diffs OR test cases OR ways to reproduce?

I see you are adding subtasks. Ignore me then.

Sorry, you're just too fast ;) @cscott asked me in a meeting to file this task to group those issues, I was still searching for them.

Multiple empty <p> tags appear *not* to generate <br/> tags, AFAICT the only way Parsoid will generate <br/> tags is if VE gives them to us. So *in theory* these are either bugs in VE or something we can address with scrubWikitext on the Parsoid side. But my naive understanding is probably wrong in various ways, hopefully we can search through the subtasks here and figure out what's going on.

Multiple empty <p> tags appear *not* to generate <br/> tags, AFAICT the only way Parsoid will generate <br/> tags is if VE gives them to us. So *in theory* these are either bugs in VE or something we can address with scrubWikitext on the Parsoid side. But my naive understanding is probably wrong in various ways, hopefully we can search through the subtasks here and figure out what's going on.

I was about to say I agreed with @cscott here because my initial tests didn't reveal a problem, but turns out I forgot to test with the --scrubWikitext option. See T248138#6110071

Looks like I added this normalization code originally because of T184755: Consider not removing multiple blank lines/white space between paragraphs. So, we need to look into that and reconcile expectations.

Looks like I added this normalization code originally because of T184755: Consider not removing multiple blank lines/white space between paragraphs. So, we need to look into that and reconcile expectations.

Ya, I am going to lob this back to VE. In T184755, we agreed to have Parsoid convert empty p-tags to <br/> tags. So, I think it is for VE to inspect why those <p></p> tags are being inserted if they are not desirable in the final output. For example, in T248138, when a category link was moved from top to bottom, why does VE insert a <p></p> in between the category links? So, to me, this looks like VE bug territory.

I think, T215002: New paragraph before section heading becomes line break is a parsoid issue, the VisualEditor is doing everything as expected. So I'm not sure if the task should be in the "Non-Parsoid Tasks" column.

I think, T215002: New paragraph before section heading becomes line break is a parsoid issue, the VisualEditor is doing everything as expected. So I'm not sure if the task should be in the "Non-Parsoid Tasks" column.

We need to figure out what to do about the fact that wikitext doesn't have a way of visually preserving odd-number of newlines when followed by a "block" tag (like headings, lists <div> tags, etc), i.e. in the generated HTML, 2n+1 and 2n+2 consecutive newlines in wikitext visually render identically in the generated HTML (with n <br/> tags). So, I suppose if we agree that it is okay for Parsoid to similarly drop 1 visual newline in HTML, we can convert that back to an equivalent wikitext form without a <br/>. As part of T148755, I implemented a form that preserves all visual newlines so that HTML -> HTML rendering is preserved.

But, given the complaints, perhaps it is logical to drop a visual newline from the HTML since the preservation requires a <br/> which it looks like is considered a worse behavior than preserving the newline?

Edited: Updated the wikitext newline behavior description to be more accurate and clearer.

Change 594967 had a related patch set uploaded (by Subramanya Sastry; owner: Subramanya Sastry):
[mediawiki/services/parsoid@master] WIP: DOMNormalizer: Don't convert the odd <p></p> "visual newline" to <br/>

https://gerrit.wikimedia.org/r/594967

There, I uploaded a patch that implements the proposed idea -- feedback welcome about which way to round up the odd <p></p> visual newline.

ssastry renamed this task from Parsoid generates <br> tags in wikitext out of nowhere to Parsoid should not introduce a stray <br/> tag in wikitext just to preserve all visual newlines (<p></p) introduced by VE.May 7 2020, 2:43 PM
Honischboy renamed this task from Parsoid should not introduce a stray <br/> tag in wikitext just to preserve all visual newlines (<p></p) introduced by VE to Parsoid should not introduce a stray <br/> tag in wikitext just to preserve all visual newlines (<p></p>) introduced by VE.May 8 2020, 6:54 AM

Change 594967 merged by jenkins-bot:
[mediawiki/services/parsoid@master] DOMNormalizer: Don't convert the odd <p></p> "visual newline" to <br/>

https://gerrit.wikimedia.org/r/594967

Change 608437 had a related patch set uploaded (by Subramanya Sastry; owner: Subramanya Sastry):
[mediawiki/vendor@master] Bump Parsoid to v0.12.0-a19

https://gerrit.wikimedia.org/r/c/mediawiki/vendor/ /608437

Okay, once we deploy the above patch, many of the subtasks will be addressed. I'm going to pre-emptively resolve tasks that I think will be fixed. Please reopen any if you see the problems next week.

Change 608437 merged by jenkins-bot:
[mediawiki/vendor@master] Bump Parsoid to v0.12.0-a19

https://gerrit.wikimedia.org/r/c/mediawiki/vendor/ /608437