Page MenuHomePhabricator

Common newline diff in DiscussionTools edits
Closed, ResolvedPublic

Description

See this session:

$ echo -e 'foo\n\n<!--bar-->\n\n[[Category:Baz]]' > /tmp/wt
$ php bin/parse.php < /tmp/wt > /tmp/old.html
$ sed 's/<\/p>/<\/p>\n<ul><li>x<\/li><\/ul>/g;' < /tmp/old.html > /tmp/new.html
$ php bin/parse.php --selser --html2wt --oldtextfile /tmp/wt --oldhtmlfile /tmp/old.html < /tmp/new.html > /tmp/wt.new
$ diff /tmp/wt /tmp/wt.new
3c3
< <!--bar-->
---
> * x
4a5
> <!--bar-->
$ cat /tmp/wt.new
foo

* x

<!--bar-->
[[Category:Baz]]

This mimics the most common dirty diff now seen on discussion tool edits (hiwiki and ptwiki being the wikis where this seems to be prevalent).

I originally thought this was because of DOMDiff (which marks a large part of the DOM modified) but even after I manually tweaked the diff algorithm, the diffs persisted. It turns out that the separator constraints code forces max-newlines to 2 to prevent empty paragraph-wrappers. However, in this particular case, the trailing newlines, comment, and the category link are not p-wrapped since they are all rendering transparent. So, this is somewhat of an edge case in the logic but something that is being repeated triggered when replies are being added close to the end of a document.

But, more pertinently, the code doesn't have smarts to realize that even though all the content is marked edited, all the content comes from source. I am not convinced we should add that logic since it just adds unnecessary complexity.

So, if there is a fix we should make, it might be to the newline constraints logic, if at all.

Event Timeline

Change 703913 had a related patch set uploaded (by Subramanya Sastry; author: Subramanya Sastry):

[mediawiki/services/parsoid@master] Tweak heuristic to trim excess newlines from a separator string

https://gerrit.wikimedia.org/r/703913

Change 703913 merged by jenkins-bot:

[mediawiki/services/parsoid@master] Tweak heuristic to trim excess newlines from a separator string

https://gerrit.wikimedia.org/r/703913

Change 705497 had a related patch set uploaded (by Subramanya Sastry; author: Subramanya Sastry):

[mediawiki/vendor@master] Bump wikimedia/parsoid to 0.14.0-a8

https://gerrit.wikimedia.org/r/705497

Change 705497 merged by jenkins-bot:

[mediawiki/vendor@master] Bump wikimedia/parsoid to 0.14.0-a8

https://gerrit.wikimedia.org/r/705497

ssastry claimed this task.