Page MenuHomePhabricator

Exclude outer whitespace from headings and list items
Closed, DuplicatePublic

Description

We currently include purely syntactic whitespace in the DOM, which makes life for VE and other clients harder than necessary. Instead, we should abstract purely syntactic whitespace and match the PHP parser's output.

Test cases:

Foo

should parse to <h2>Foo</h2> instead of <h2> Foo </h2>

  • foo

should parse to <ul><li>foo</li></ul> instead of <ul><li> foo</li></ul>


Version: unspecified
Severity: normal

Details

Reference
bz51004

Event Timeline

bzimport raised the priority of this task from to Normal.Nov 22 2014, 2:01 AM
bzimport added a project: Parsoid-DOM.
bzimport set Reference to bz51004.

Isn't this a more generic problem that is not limited to lists and headings? It seems we should trim whitespace from all first/last child text nodes of all non-pre elements. Otherwise, it doesn't really benefit VE, for example, since they would still have to maintain whitespace information and restore it on save.

This normalization will then mean only selser will be able to reserialize content without introducing dirty diffs. If we want regular serializer to preserve whitespace, then, we have to record details of normalized whitespace in data-parsoid.

https://gerrit.wikimedia.org/r/#/c/96790/ did some related work in the serializer, but did not change the DOM representation yet.

marcoil removed GWicke as the assignee of this task.Nov 25 2014, 6:30 PM
marcoil added a project: Parsoid.
marcoil set Security to None.