Page MenuHomePhabricator

Consider not removing multiple blank lines/white space between paragraphs
Closed, ResolvedPublic

Description

Some wikis want to use repeated blank lines to create extra white space on a page. This doesn't work in the visual editor (and is discouraged by the Manual of Style at the large Wikipedias), although it does work if you put multiple blank lines in between paragraphs in the wikitext mode.

Please consider keeping the extra blank space.

Related Objects

Mentioned In
T255832: Visual Editor should emit <p><br/></p> for "visual newlines"
T255798: Missing serialization spec
T251920: Parsoid should not introduce a stray <br/> tag in wikitext just to preserve all visual newlines (<p></p>) introduced by VE
T215002: New paragraph before section heading becomes line break
T204477: Create punjabi.wikimedia.org for Punjabi Wikimedians User Group
T205546: Create Wiktionary Cantonese
T205710: Create Wikinews Limburgish
T206777: Create Wikipedia Shan
T207286: Time profiling: Replace millisecond granularity timers with microsecond granularity timers
T208470: Parsoid should nowiki-escape '}' in a table cell or insert a whitespace character, as appropriate
T187142: Deduplicate template styles in Parsoid
T210437: Sanitizer::stripAllTags shouldn't expand legacy "semicolon-less" HTML5 entities
T209236: "&params" URL parameter (used in a link parameter in [[File]] markup) incorrectly parsed as "¶ms" (%C2%B6ms)
Mentioned Here
rGPAR18a98afcd12c: Don't normalize mw-empty-elt empty paragraphs that come from source
T187142: Deduplicate template styles in Parsoid
T204477: Create punjabi.wikimedia.org for Punjabi Wikimedians User Group
T205546: Create Wiktionary Cantonese
T205710: Create Wikinews Limburgish
T206777: Create Wikipedia Shan
T207286: Time profiling: Replace millisecond granularity timers with microsecond granularity timers
T208470: Parsoid should nowiki-escape '}' in a table cell or insert a whitespace character, as appropriate
T209236: "&params" URL parameter (used in a link parameter in [[File]] markup) incorrectly parsed as "¶ms" (%C2%B6ms)
T210437: Sanitizer::stripAllTags shouldn't expand legacy "semicolon-less" HTML5 entities
T175421: Parsoid doesn't insert same spacers when article text starts with two or more newlines.

Event Timeline

This normalisation is done by Parsoid, but seemingly not documented here: https://www.mediawiki.org/wiki/Parsoid/Normalizations nor do I remember it being added.

This normalisation is done by Parsoid, but seemingly not documented here: https://www.mediawiki.org/wiki/Parsoid/Normalizations nor do I remember it being added.

I am confused now ... I didn't think this was Parsoid related. I still don't have a reproducible example of what the complaint is (from https://www.mediawiki.org/wiki/Topic:Tuhfcu8edr2h8xso )

Reproduction should be very easy:

  1. Open the visual editor.
  2. Type something. Press return ten times. Type something else.
  3. Save page.
  4. Discover that your ten blank lines are now just one.

Thanks. Yes, I see that now. So, it is not some edge case behavior.

[subbu@earth tools] echo "<p>foo</p><p></p><p></p><p></p><p>bar</p>" | parse.js --html2wt
foo

bar

I was mistaken about it on the flow thread then. This is a Parsoid bug. Looks like this has been the behavior in Parsoid since the very beginning since there isn't any special normalization we are doing for this.

ssastry triaged this task as Medium priority.Jan 12 2018, 11:38 PM
ssastry moved this task from Needs Triage to html2wt on the Parsoid board.
ssastry raised the priority of this task from Medium to High.Feb 20 2018, 5:21 AM

T175421: Parsoid doesn't insert same spacers when article text starts with two or more newlines. may be related (at least in the sense that it could be fixed at the same time, if not actually due to the same cause).

@Esanders, so, in trying to fix this bug in Parsoid, I was playing around with HTML snippets in the browser and I looked at the following three:

<h1>br tags wrapped in p tags</h1>
<p>foo</p>
<p><br/><br/></p>
<p>boo</p>

<h1>br tags</h1>
<p>foo</p>
<br/><br/>
<p>boo</p>

<h1>Empty p-tags</h1>
<p>foo</p>
<p></p>
<p>boo</p>

When I look at the rendering in the browser, only the first 2 snippets generate the necessary whitespace. The browser suppresses the empty paragraph effectively. Given that, shouldn't VE be generating br tags instead of empty p-tags?

But, one simple solution to tackling this in Parsoid would be to implement a normalization wherein empty p-tags are normalized to <br/> tags. While we can do that relatively easily, is there a reason why VE is generating empty p-tags when editors insert empty lines?

238482n375 lowered the priority of this task from High to Lowest.
238482n375 moved this task from Next Up to In Code Review on the Analytics-Kanban board.
238482n375 edited subscribers, added: 238482n375; removed: Aklapper.

SG9tZVBoYWJyaWNhdG9yCk5vIG1lc3NhZ2VzLiBObyBub3RpZmljYXRpb25zLgoKICAgIFNlYXJjaAoKQ3JlYXRlIFRhc2sKTWFuaXBoZXN0ClQxOTcyODEKRml4IGZhaWxpbmcgd2VicmVxdWVzdCBob3VycyAodXBsb2FkIGFuZCB0ZXh0IDIwMTgtMDYtMTQtMTEpCk9wZW4sIE5lZWRzIFRyaWFnZVB1YmxpYwoKICAgIEVkaXQgVGFzawogICAgRWRpdCBSZWxhdGVkIFRhc2tzLi4uCiAgICBFZGl0IFJlbGF0ZWQgT2JqZWN0cy4uLgogICAgUHJvdGVjdCBhcyBzZWN1cml0eSBpc3N1ZQoKICAgIE11dGUgTm90aWZpY2F0aW9ucwogICAgQXdhcmQgVG9rZW4KICAgIEZsYWcgRm9yIExhdGVyCgpUYWdzCgogICAgQW5hbHl0aWNzLUthbmJhbiAoSW4gUHJvZ3Jlc3MpCgpTdWJzY3JpYmVycwpBa2xhcHBlciwgSkFsbGVtYW5kb3UKQXNzaWduZWQgVG8KSkFsbGVtYW5kb3UKQXV0aG9yZWQgQnkKSkFsbGVtYW5kb3UsIEZyaSwgSnVuIDE1CkRlc2NyaXB0aW9uCgpPb3ppZSBqb2JzIGhhdmUgYmVlbiBmYWlsaW5nIGF0IGxlYXN0IGEgZmV3IHRpbWVzIGVhY2guIE1vcmUgaW52ZXN0aWdhdGlvbiBuZWVkZWQuCkpBbGxlbWFuZG91IGNyZWF0ZWQgdGhpcyB0YXNrLkZyaSwgSnVuIDE1LCA3OjIxIEFNCkhlcmFsZCBhZGRlZCBhIHN1YnNjcmliZXI6IEFrbGFwcGVyLiC3IFZpZXcgSGVyYWxkIFRyYW5zY3JpcHRGcmksIEp1biAxNSwgNzoyMSBBTQpKQWxsZW1hbmRvdSBjbGFpbWVkIHRoaXMgdGFzay5GcmksIEp1biAxNSwgNzoyMiBBTQpKQWxsZW1hbmRvdSB1cGRhdGVkIHRoZSB0YXNrIGRlc2NyaXB0aW9uLiAoU2hvdyBEZXRhaWxzKQpKQWxsZW1hbmRvdSBhZGRlZCBhIHByb2plY3Q6IEFuYWx5dGljcy1LYW5iYW4uCkpBbGxlbWFuZG91IG1vdmVkIHRoaXMgdGFzayBmcm9tIE5leHQgVXAgdG8gSW4gUHJvZ3Jlc3Mgb24gdGhlIEFuYWx5dGljcy1LYW5iYW4gYm9hcmQuCkNoYW5nZSBTdWJzY3JpYmVycwpDaGFuZ2UgUHJpb3JpdHkKQXNzaWduIC8gQ2xhaW0KTW92ZSBvbiBXb3JrYm9hcmQKQ2hhbmdlIFByb2plY3QgVGFncwpBbmFseXRpY3MtS2FuYmFuCtcKU2VjdXJpdHkK1wpXaWtpbWVkaWEtVkUtQ2FtcGFpZ25zIChTMi0yMDE4KQrXClNjYXAK1wpTY2FwIChTY2FwMy1BZG9wdGlvbi1QaGFzZTIpCtcKQWJ1c2VGaWx0ZXIK1wpEYXRhLXJlbGVhc2UK1wpIYXNodGFncwrXCkxhYnNEQi1BdWRpdG9yCtcKTGFkaWVzLVRoYXQtRk9TUy1NZWRpYVdpa2kK1wpMYW5ndWFnZS0yMDE4LUFwci1KdW5lCtcKTGFuZ3VhZ2UtMjAxOC1KYW4tTWFyCtcKSEhWTQrXCkhBV2VsY29tZQrXCkJvbGQKSXRhbGljcwpNb25vc3BhY2VkCkxpbmsKQnVsbGV0ZWQgTGlzdApOdW1iZXJlZCBMaXN0CkNvZGUgQmxvY2sKUXVvdGUKVGFibGUKVXBsb2FkIEZpbGUKTWVtZQpQcmV2aWV3CkhlbHAKRnVsbHNjcmVlbiBNb2RlClBpbiBGb3JtIE9uIFNjcmVlbgoyMzg0ODJuMzc1IGFkZGVkIHByb2plY3RzOiBTZWN1cml0eSwgV2lraW1lZGlhLVZFLUNhbXBhaWducyAoUzItMjAxOCksIFNjYXAgKFNjYXAzLUFkb3B0aW9uLVBoYXNlMiksIEFidXNlRmlsdGVyLCBEYXRhLXJlbGVhc2UsIEhhc2h0YWdzLCBMYWJzREItQXVkaXRvciwgTGFkaWVzLVRoYXQtRk9TUy1NZWRpYVdpa2ksIExhbmd1YWdlLTIwMTgtQXByLUp1bmUsIExhbmd1YWdlLTIwMTgtSmFuLU1hciwgSEhWTSwgSEFXZWxjb21lLlBSRVZJRVcKMjM4NDgybjM3NSBtb3ZlZCB0aGlzIHRhc2sgZnJvbSBJbiBQcm9ncmVzcyB0byBJbiBDb2RlIFJldmlldyBvbiB0aGUgQW5hbHl0aWNzLUthbmJhbiBib2FyZC4KMjM4NDgybjM3NSByZW1vdmVkIEpBbGxlbWFuZG91IGFzIHRoZSBhc3NpZ25lZSBvZiB0aGlzIHRhc2suCjIzODQ4Mm4zNzUgdHJpYWdlZCB0aGlzIHRhc2sgYXMgTG93ZXN0IHByaW9yaXR5LgoyMzg0ODJuMzc1IHJlbW92ZWQgc3Vic2NyaWJlcnM6IEFrbGFwcGVyLCBKQWxsZW1hbmRvdS4KQ29udGVudCBsaWNlbnNlZCB1bmRlciBDcmVhdGl2ZSBDb21tb25zIEF0dHJpYnV0aW9uLVNoYXJlQWxpa2UgMy4wIChDQy1CWS1TQSkgdW5sZXNzIG90aGVyd2lzZSBub3RlZDsgY29kZSBsaWNlbnNlZCB1bmRlciBHTlUgR2VuZXJhbCBQdWJsaWMgTGljZW5zZSAoR1BMKSBvciBvdGhlciBvcGVuIHNvdXJjZSBsaWNlbnNlcy4gQnkgdXNpbmcgdGhpcyBzaXRlLCB5b3UgYWdyZWUgdG8gdGhlIFRlcm1zIG9mIFVzZSwgUHJpdmFjeSBQb2xpY3ksIGFuZCBDb2RlIG9mIENvbmR1Y3QuILcgV2lraW1lZGlhIEZvdW5kYXRpb24gtyBQcml2YWN5IFBvbGljeSC3IENvZGUgb2YgQ29uZHVjdCC3IFRlcm1zIG9mIFVzZSC3IERpc2NsYWltZXIgtyBDQy1CWS1TQSC3IEdQTApZb3VyIGJyb3dzZXIgdGltZXpvbmUgc2V0dGluZyBkaWZmZXJzIGZyb20gdGhlIHRpbWV6b25lIHNldHRpbmcgaW4geW91ciBwcm9maWxlLCBjbGljayB0byByZWNvbmNpbGUu

Aklapper raised the priority of this task from Lowest to High.

Change 465656 had a related patch set uploaded (by Subramanya Sastry; owner: Subramanya Sastry):
[mediawiki/services/parsoid@master] WIP: Normalize DOM to convert empty <p></p> tags to <p>?<br/></p>?

https://gerrit.wikimedia.org/r/465656

If we need to output <p><br></p> instead of <p></p> in the HTML we send to Parsoid, then I suppose we can do that.

But I'm not sure if VisualEditor or Parsoid is the better place to fix this.

<h1>Empty p-tags</h1>
<p>foo</p>
<p></p>
<p>boo</p>

When I look at the rendering in the browser, only the first 2 snippets generate the necessary whitespace. The browser suppresses the empty paragraph effectively. Given that, shouldn't VE be generating br tags instead of empty p-tags?

I will note that this applies to the following snippets as well. The empty element has no visible rendering (due to margin collapsing):

<h1>foo</h1>
<h1></h1>
<h1>boo</h1>
<pre>foo</pre>
<pre></pre>
<pre>boo</pre>
<blockquote>foo</blockquote>
<blockquote></blockquote>
<blockquote>boo</blockquote>

But, we probably shouldn't be inserting <br> tags inside empty elements other than <p>, because apparently they are not ignored by Parsoid (and serialize to <br> in wikitext)? This behavior is very difficult for me to understand.

Also… when Parsoid parses a bunch of newlines in wikitext and outputs a bunch of <p><br></p> structures for them, it should mark the <br> node somehow so that we can ignore it in VisualEditor. Currently VisualEditor treats it as any other <br> tag and allows inserting content on either side of it (essentially, rendering it as two empty lines instead of one).

Change 465761 had a related patch set uploaded (by Bartosz Dziewoński; owner: Bartosz Dziewoński):
[VisualEditor/VisualEditor@master] [DNM] Output <br> inside empty <p></p> in Parsoid HTML

https://gerrit.wikimedia.org/r/465761

But, we probably shouldn't be inserting <br> tags inside empty elements other than <p>, because apparently they are not ignored by Parsoid (and serialize to <br> in wikitext)? This behavior is very difficult for me to understand.

You are right, my original reasoning on relying on how HTML renders those is flawed since none of those empty tags render in HTML either.

Refreshing my memory about the code, the reason paragraphs seem to be special is because p-tags map to newlines and we normalize newline counts to ensure core semantics are preserved. More specifically, newlines in the wt -> html direction map to either <p><br/></p> or <p><br/>some content here </p>. So, in the html -> wt direction, Parsoid expects <br/> to force extra newlines. The absence of <br/> is causing Parsoid to normalize newline counts. I think this behavior is correct since it is a mirror of what the wt -> html generates.

So, the qn. is if Parsoid should accept <p></p> and normalize it to the expected form (my patch) or should VE return HTML that matches Parsoid's expectations. Given that Parsoid normalizes other empty forms to what the core html -> wt code expects, I suppose it is reasonable to have Parsoid normalize it to either <p><br/> (1 newline) or <p><br/></p> (2 newlines), i.e. be more liberal about how it treats variations from canonical HTML. Thoughts?

Relatedly, but not directly related to this patch, this once again highlights missing documentation / spec about canonical HTML Parsoid expects (for canonical wikitext output) and how it handles variations from that. At our recent offsite, we discussed document a html -> wt spec (not just wt -> html spec that we currently publish) that makes explicit and documents this behavior and identifies inconsistencies that can be fixed.

Change 465656 merged by jenkins-bot:
[mediawiki/services/parsoid@master] Scrub DOM to convert <p></p> sequences to NL-emitting normal forms

https://gerrit.wikimedia.org/r/465656

Change 465761 abandoned by Bartosz Dziewoński:
[DNM] Output <br> inside empty <p></p> in Parsoid HTML

Reason:
No longer needed after https://gerrit.wikimedia.org/r/c/mediawiki/services/parsoid/ /465656

https://gerrit.wikimedia.org/r/465761