Page MenuHomePhabricator

html2wt of existing list items, headings, table cells
Closed, ResolvedPublic

Description

In T157418: RFC: Make some aspects of Tidy's whitespace stripping behavior part of wikitext parsing "spec", we made leading/trailing whitespace in list items, table cells, headings insignificant and does not make it out to the HTML output. However, this means that Parsoid no longer has any information about original whitespace in these wikitext items.

This does not matter for unedited content or newly added content.

  • For unedited content, the selective serialization algorithm preserves original wikitext as is.
  • For newly added content, wikitext norms (as coded in Parsoid's html -> wt code) require that for readability reasons, whitespace be added appropriately.

All good so far. However, for original list items / headings / table cells that got edited, without any additional work, we'll start seeing "dirty diffs" since Parsoid will start trimming leading/trailing whitespace from these.

But, there is no clear solution that works well in all cases. Here are 3 possibilities:

  1. If we leave things as is, this will cause dirty diffs in edited original content as above.
  2. If we add readable whitespace always (for all content, not just newly added content), this will cause dirty diffs in the other direction, i.e. for example, list items that didn't have whitespace after bullets will ge them
  3. If we add additional logic to Parsoid to figure out how the original content looked and preserve it, that whitespace will get locked in forever for all edits done via VisualEditor (or other such HTML clients). There is no available mechanism for these clients to tell Parsoid to add/remove that whitespace. Editors wishing to alter whitespace would have to directly edit wikitext in a source editor.

Thoughts? My hunch is that we'll probably gravitate towards solution 3.

Event Timeline

ssastry created this task.May 24 2018, 1:48 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMay 24 2018, 1:48 PM
ssastry triaged this task as High priority.May 24 2018, 1:48 PM
ssastry updated the task description. (Show Details)
ssastry added a project: VisualEditor.
ssastry edited subscribers, added: Deskana, Esanders; removed: VisualEditor.

Change 434961 had a related patch set uploaded (by Subramanya Sastry; owner: Subramanya Sastry):
[mediawiki/services/parsoid@master] Minimize whitespace dirty diffs in existing headings, tables, lists

https://gerrit.wikimedia.org/r/434961

Change 434961 had a related patch set uploaded (by Subramanya Sastry; owner: Subramanya Sastry):
[mediawiki/services/parsoid@master] Minimize whitespace dirty diffs in existing headings, tables, lists
https://gerrit.wikimedia.org/r/434961

This implements a limited form of solution 3 -- it normalizes multiple whitespace chars for edited list items, headings, table cells to a single whitespace char. If there are complaints, we can generalize this solution further.

ssastry claimed this task.May 24 2018, 11:47 PM
ssastry moved this task from Backlog to html2wt on the Parsoid board.

Change 434961 merged by jenkins-bot:
[mediawiki/services/parsoid@master] Minimize whitespace dirty diffs in existing headings, tables, lists

https://gerrit.wikimedia.org/r/434961

Change 436091 had a related patch set uploaded (by Subramanya Sastry; owner: Subramanya Sastry):
[mediawiki/services/parsoid@master] WS-minimization heuristics only apply to comments & text nodes

https://gerrit.wikimedia.org/r/436091

Change 436091 merged by jenkins-bot:
[mediawiki/services/parsoid@master] WS-minimization heuristics only apply to comments & text nodes

https://gerrit.wikimedia.org/r/436091

ssastry closed this task as Resolved.Jun 5 2018, 4:13 PM

Okay, I just went with a restricted version of #3 for now. This is now live as of y'day.

Restricted Application added a project: User-Ryasmeen. · View Herald TranscriptJun 5 2018, 4:13 PM
Vvjjkkii renamed this task from html2wt of existing list items, headings, table cells to 5ccaaaaaaa.Jul 1 2018, 1:08 AM
Vvjjkkii reopened this task as Open.
Vvjjkkii removed ssastry as the assignee of this task.
Vvjjkkii updated the task description. (Show Details)
Vvjjkkii removed subscribers: gerritbot, Aklapper.
CommunityTechBot renamed this task from 5ccaaaaaaa to html2wt of existing list items, headings, table cells.Jul 2 2018, 1:28 PM
CommunityTechBot closed this task as Resolved.
CommunityTechBot assigned this task to ssastry.
CommunityTechBot updated the task description. (Show Details)
CommunityTechBot added subscribers: gerritbot, Aklapper.