Page MenuHomePhabricator

Edge case: newlines can break talk page use case
Closed, ResolvedPublic8 Estimated Story Points

Description

I'll update this task with more details later, but wanted to report that we seem to incorrectly end up with a "change" diff when double newlines appear in the text.

Base: "A\n\nB"
Other: "A\n\nB\n\nC"
Yours: "A\n\nB\n\nD"

I would have expected (copy: A\n\nB, add: \n\nC) but instead we see (copy: A, change: \n\nB -> \n\nB\n\nC)

Event Timeline

awight created this task.Mar 27 2020, 10:37 AM
Restricted Application added a project: archived--TCB-Team. · View Herald TranscriptMar 27 2020, 10:37 AM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript

I was experiencing something similar:

Base: "A\n\nB"
Other: "A\n:C\n\nB"
Yours: "A\n:D\n\nB"

I expected this to be detected as (copy: A, remove: :C, add: :D, copy: B) but instead got (change: A\n:C → A\n:D, copy: B). Which means the talk page special case I wanted to trigger was not triggered.

My theory is that both examples are effects of the original diff3 algorithm, and nothing we can change.

I expected this to be detected as (copy: A, remove: :C, add: :D, copy: B) but instead got (change: A\n:C → A\n:D, copy: B). Which means the talk page special case I wanted to trigger was not triggered.

Slight correction, I started this thread with bad examples--we don't do a 3-way diff internally, so what we would expect is,

$diffStoredLines = (copy: A, add: :C, copy: B)
$diffYourLines = (copy: A, add: :D, copy: B)

This is what happens in our regular 2col conflict interface,

However, the talk page ResolutionSuggester fails! I dumped the two diffs that we compare, and it confirms the problem symptom,

$diffStoredLines = [{"action":"change","oldhtml":"A","newhtml":"A\n<ins class=\"mw-twocolconflict-diffchange\">:C<\/ins>","oldtext":"A\n","newtext":"A\n:C\n"},{"action":"copy","copytext":"B"}]
$diffYourLines = [{"action":"change","oldhtml":"A","newhtml":"A\n<ins class=\"mw-twocolconflict-diffchange\">:D<\/ins>","oldtext":"A\n","newtext":"A\n:D\n"},{"action":"copy","copytext":"B"}]

Why is the regular workflow able to split this into a nice, line-based diff, but the talk page algorithm cannot?

My theory is that both examples are effects of the original diff3 algorithm, and nothing we can change.

Our diffs don't come from diff3, that's only used for initial conflict detection. The algorithm responsible for the change list we're looking at is MediaWiki core's Diff and DiffEngine, but I agree with your point about it being out of scope.

Now I'm concerned, this issue feels like a big risk to the talk page use case. I think I was wrong about "double newline" being the trigger for the problem. We need some test cases...

Change 589020 had a related patch set uploaded (by Thiemo Kreuz (WMDE); owner: Thiemo Kreuz (WMDE)):
[mediawiki/extensions/TwoColConflict@master] Fix edit warning popping up because of irrelevant newlines

https://gerrit.wikimedia.org/r/589020

Change 589028 had a related patch set uploaded (by Thiemo Kreuz (WMDE); owner: Thiemo Kreuz (WMDE)):
[mediawiki/extensions/TwoColConflict@master] Add test case for currently undetected talk page use case

https://gerrit.wikimedia.org/r/589028

FYI, I know what's going on here. This is indeed messing up half of the talk page use cases, and should be fixed because of this. I will update you tomorrow.

Change 589088 had a related patch set uploaded (by Thiemo Kreuz (WMDE); owner: Thiemo Kreuz (WMDE)):
[mediawiki/extensions/TwoColConflict@master] [WIP] Fix overly complicated newline handling

https://gerrit.wikimedia.org/r/589088

Change 589028 merged by jenkins-bot:
[mediawiki/extensions/TwoColConflict@master] Add test case for currently undetected talk page use case

https://gerrit.wikimedia.org/r/589028

Change 589219 had a related patch set uploaded (by Thiemo Kreuz (WMDE); owner: Thiemo Kreuz (WMDE)):
[mediawiki/extensions/TwoColConflict@master] Track linefeeds at the beginning of blocks as well

https://gerrit.wikimedia.org/r/589219

thiemowmde triaged this task as High priority.
thiemowmde moved this task from Doing to Review on the WMDE-QWERTY-Sprint-2020-04-15 board.

I raised the priority of this issue because it interferes quite heavily with our talk page special case handling. A large amount of auto-resolvable situations that should be detected are not detected because of this bug. The main fix is https://gerrit.wikimedia.org/r/589088, including a longer explanation of what's going on.

Essentially, I had to remove a feature to fix this bug, which will leave us with another issue: The conflict resolution screen will now show empty boxes much more often. I created T250402: User-friendly presentation of whitespace-only changes (empty boxes) to track this.

Change 589088 merged by jenkins-bot:
[mediawiki/extensions/TwoColConflict@master] Fix insufficient newline handling in all diff related code

https://gerrit.wikimedia.org/r/589088

Change 589219 merged by jenkins-bot:
[mediawiki/extensions/TwoColConflict@master] Track linefeeds at the beginning of blocks as well

https://gerrit.wikimedia.org/r/589219

thiemowmde set the point value for this task to 8.Apr 23 2020, 9:14 AM

The issue described in the ticket above is fixed I would say. We could rephrase the ticket title and I think this is done then. The issues we just saw in the demo could move to the new line ticket still in doing then. Affirmative? @thiemowmde