Page MenuHomePhabricator

Do not overwrite edits when conflicting with self
Open, Needs TriagePublic

Description

Currently, MediaWiki will ignore conflicts between edits of the same user, causing previous edits to be overwritten, if there were no edits by other users since the edit's base revision (and the current edit is not a section edit). The relevant code is at https://phabricator.wikimedia.org/source/mediawiki/browse/master/includes/EditPage.php;56ade0650d492e01d2689dc918e7db3d56d18147$1997

This behavior seems undesirable: If I have the same page open in two tabs for some reason, and edit in both (possibly after hours or days), the second edit will undo the first edit, unless someone else has edited in the meantime. There seems to be no documented justification for this.

History:
Originally, conflict detection would be disabled if the current user was the creator of the edit's base revision, no matter if other users edited. This behavior can be found the first revision of MediaWiki on record in git, Lee Daniel Crocker's "Initial revision" from 2003: https://phabricator.wikimedia.org/source/mediawiki/browse/master/includes/Article.php;d82c14fb4fbac288b42ca5918b0a72f33ecb1e69$419

Later, brion changed this so merging would work on section edits, but conflicts on whole page edits would still overwrite: https://phabricator.wikimedia.org/rMWc6e870e5c6f85a693e01c0c16ce027b4b7e45480

In 2008, Aaron added a check that made sure the current user was the only one to edit after the base revision: https://phabricator.wikimedia.org/rMW43129ccd4426f3d059b7851cc8acc02360070601

This was rather tricky to track down, since the method was moved from Article to EditPage and then later to Revison (by me).

But the big question is: WHY do we do that? It seems like we are going out of our way to do the wrong thing in an edge case. Am I missing some obvious benefit of this?

Event Timeline

Historical justification, as per Brion on IRC:

[22:20] <brion> So
[22:20] <brion> Back in the day if you used the edit form and saved
[22:21] <brion> And then clicked back and made a further change and save again
[22:21] <brion> It would edit conflict
[22:21] <brion> I think this was a hack for that
[22:21] <brion> But yeah if it fucks you up it's... not pretty
[22:21] <DanielK_WMDE> ah... that would work because "back" took you back to a text field that still had the modified text, which you could then adjust?
[22:22] <DanielK_WMDE
> this was before tabbed browsing, yes?
[22:22] <DanielK_WMDE__> this seems to be doing more harm then good...
[22:22] <brion> Yeah Netscape 4 days
[22:22] <brion> It may be... Unwise now.

I'd be OK with changing this, but it'll likely have some fallout with editing tools relying on this behaviour.

Change 377813 had a related patch set uploaded (by Daniel Kinzler; owner: Daniel Kinzler):
[mediawiki/core@master] Do not ignore edit conflicts with oneself.

https://gerrit.wikimedia.org/r/377813

I remember suffering silly self-conflicts. An additional issue is that ~~~~ magic expansion almost ensures that there will be a conflict, even if it could otherwise be automatically resolved.

aaron explained it quite well on https://phabricator.wikimedia.org/rMW43129ccd4426f3d059b7851cc8acc02360070601

It's not strange at all to notice a typo in what you wrote just after pressing Save (ie. while it is connecting with the server). So what do you do? You press Esc to stop it hoping that you were fast enough. Most probably, you weren't (it was saved), and your amended edit will be a self-conflict.

That seems more common than editing the same article on two different tabs without noticing. Maybe we should try to detect if it was from the same or a different tab?

Related tasks: T2275 T3150

I just recently accidentally overwrote my own changes when using two tabs. I think it would be better to ask confirmation before overwriting on these cases rather than doing it silently.

If we still want to cover the "back button" use case, I think we should try and detect that situation on the client side.

A bit of JS could set a flag in the form just before it is submitted for saving, so the form knows the contents of the text field have already been saved. It can then warn and offer to load the latest version into the editor when the edit field gets focus again.

With JS disabled, you'd get a conflict if you edit the same paragraph again. With JS on, you would be able to do the same thing you do now, with one extra click, and without the danger of accidentally reverting yourself.

Tagging for TechCom review. Would be great to move this forwards somehow.

Change 377813 abandoned by Daniel Kinzler:
Do not ignore edit conflicts with oneself.

Reason:
attic

https://gerrit.wikimedia.org/r/377813

The problem that prompted the "ignore-self-conflict" behavior to be implemented seems to no longer exist.
I just tested this locally (Firefox 73, Chromium 80, with and without JS): when I edit a page, and then hit the "back" button to go back to the edit page, the edit form is apparently re-loaded, so it not only has the new text, but it also has the new values in the editRevId and wpEdittime fields. It doesn't matter whether the same user or a different user edits in the mean time, hitting "back" to gets back to the edit page will load the edit page with the data for the current value, as expected.

This indicates that ignoring self-conflicts is no longer needed to prevent users from experiencing spurious conflicts. So the only thing it does is cause data loss when users edit in multiple tabs.

I'm in favor of dropping the old behavior. Even if it were still possible to get in tho this use case "noticed a typo, cancel form submission, fix typo and resubmit" which justifies ignoring a self-conflict, the conflict interface has improved to a point where it's perfectly reasonable to expect an editor to resolve the conflict explicitly. Assuming that we know the user's intention, without any heuristics to distinguish between the typo use case and accidentally editing in two tabs, seems unwise.

Here's what the typo case will look like to users (simulated by having a second editor make the conflicting edit). The second edit added the text "additional" on a non-overlapping and an overlapping line:

Legacy conflict workflow,

image.png (371×774 px, 19 KB)

Two-column conflict workflow,

image.png (377×759 px, 28 KB)

tstarling moved this task from Inbox to Backlog on the TechCom board.Jul 18 2018, 22:37

Was there a TechCom discussion? I didn't find anything in the IRC logs from June through September, 2018. If there's no dissent beyond the historical notes in this task, it seems uncontroversial to go ahead with the change.

My only suggestion would be to leave the deprecated logic in place for now, but to remove the isConflict = false line and instead log the base revision ID, so we can correlate it with conflict EventLogging.

tstarling moved this task from Inbox to Backlog on the TechCom board.Jul 18 2018, 22:37

Was there a TechCom discussion? I didn't find anything in the IRC logs from June through September, 2018. If there's no dissent beyond the historical notes in this task, it seems uncontroversial to go ahead with the change.

This was not an RFC discussion, just quick internal review during board grooming (note that this is on the committee board, not the RFC board). I just dug the discussion up from the archives, the relevant statement was: "Not a purely technical decision, need input from product/UX perspective from a user point of view". So, blocked on a product decision.

My only suggestion would be to leave the deprecated logic in place for now, but to remove the isConflict = false line and instead log the base revision ID, so we can correlate it with conflict EventLogging.

Doing a potentially expensive database query just for the logging seems overkill. Who would look at these logs, and for what purpose?

Doing a potentially expensive database query just for the logging seems overkill. Who would look at these logs, and for what purpose?

Good points, I mostly wanted to satisfy my own curiosity about whether ignoring self-conflicts is a misfeature or not. What I would have done is, record when these happen, and take diffs showing what was silently reverted or added. This would prove or falsify the theoretical typo use case, or show that these self-conflicts have stopped altogether.

The operational overhead might not be an issue since the code has been in place since the beginning of our epoch, and it's on a lukewarm or stone cold, server-side code path. But I'm assuming you're more concerned about the tech debt, and I completely agree. Just throwing it out is an excellent first step! If anyone asks for a revert, that's the point at which we can add logging.

For the record, this was discussed in the TechCom meeting two weeks ago. No concerns were raised, there are no longer any technical reasons to keep this behavior.

JTannerWMF added a subscriber: JTannerWMF.

It looks like there is no action for the Editing Team.

When exactly bfcache gets used is browser-dependent, so I'm not sure it makes sense to declare to problem nonexistent just because Firefox (current market share: 4%) does not have it anymore.

That said, AIUI this behavior is to support back-button behavior in discussions (ie, section edits on talk pages) where the user goes back to add another comment. It doesn't make any sense to overwrite anything in that case, and the correct behavior should be some kind of forced merge (or in other words, if the edit cannot be quietly merged, it should just conflict). Am I misunderstanding how the feature works?

...I guess the overwrite is needed because the user's previous signature now resolves to a different timestamp?

...I guess the overwrite is needed because the user's previous signature now resolves to a different timestamp?

Yes, that's the case this fixes. But I still think it does more harm than good, especially with people editing in multiple tabs.

Change 644250 had a related patch set uploaded (by Daniel Kinzler; owner: Daniel Kinzler):
[mediawiki/core@master] Do not ignore self-conflicts.

https://gerrit.wikimedia.org/r/644250

I was able to reproduce the edit conflict in Firefox with the following patch:

diff --git a/includes/OutputPage.php b/includes/OutputPage.php
index d3c685ae31..4bdda1d473 100644
--- a/includes/OutputPage.php
+++ b/includes/OutputPage.php
@@ -2494,8 +2494,8 @@ class OutputPage extends ContextSource {
 			# In general, the absence of a last modified header should be enough to prevent
 			# the client from using its cache. We send a few other things just to make sure.
 			$response->header( 'Expires: ' . gmdate( 'D, d M Y H:i:s', 0 ) . ' GMT' );
-			$response->header( 'Cache-Control: no-cache, no-store, max-age=0, must-revalidate' );
-			$response->header( 'Pragma: no-cache' );
+			$response->header( 'Cache-Control: no-cache, max-age=0, must-revalidate' );
+			//$response->header( 'Pragma: no-cache' );
 		}
 	}

According to this very old documentation, if a site is using HTTPS, the response headers Cache-Control: no-store and Pragma: no-cache disable the bfcache. So we probably broke the bfcache and with it the rationale for the feature by switching from HTTP to HTTPS.

With the patch above, I was able to confirm that clicking the back button and re-editing the page causes an edit conflict if the same line is edited. A signature is sufficient to trigger a conflict.

I think the way forward here is to confirm that we are reliably disabling the bfcache in major browsers, and to document this fact as a rationale for including no-store in the response header. Then we can go ahead with removal of the feature. Controlling the cache seems like a better solution than unconditionally suppressing edit conflicts.

It's unfortunate that we are apparently disabling all in-browser caching of logged-in page views, to the detriment of user experience. Maybe CC:no-store should be applied only to the edit page.

I think the way forward here is to confirm that we are reliably disabling the bfcache in major browsers, and to document this fact as a rationale for including no-store in the response header. Then we can go ahead with removal of the feature. Controlling the cache seems like a better solution than unconditionally suppressing edit conflicts.

It's unfortunate that we are apparently disabling all in-browser caching of logged-in page views, to the detriment of user experience. Maybe CC:no-store should be applied only to the edit page.

Interesting, I didn't know about bfcache. From the documentation I could find, it seems like the no-store directive would work at least for firefox. I couldn't find any info on this for other browsers.

An alternative (or complementary?) approach may be to use the pageshow event to re-load the page content if the latest revision changed.

FWIW Chrome DevTools recently added some bfcache diagnostics, so at least in that browser it should be easier to ensure that we support some specific caching behavior.