Page MenuHomePhabricator

Prevent data loss in DiscussionTools caused by MediaWiki not detecting edit conflicts with yourself
Closed, ResolvedPublic

Description

Issue originally reported here: https://www.mediawiki.org/wiki/Topic:Vhgb6y8dqm58yjni

When you open the same discussion page in two browser tabs, and try posting a new reply in each, MediaWiki does not detect the edit conflict (this is filed as T175745). DiscussionTools currently relies on edit conflict detection working correctly to handle that workflow. As a result, when saving the second reply, the first reply is lost (as if you reverted the previous edit).

Note that edit conflicts between different users are detected and automatically resolved (T240643). This problem only happens when the same user makes both edits.

This task is to figure out if we can fix the underlying MediaWiki bug, if we can get someone else to fix it (I've seen Phabricator activity of both WMDE and WMF Core Platform Team working on edit conflicts), or if we have to write our own edit conflict handling instead of relying on MediaWiki.

Event Timeline

The discussion mentions basetimestamp and starttimestamp. While working on Two-Column-Edit-Conflict-Merge I just learned that relying on timestamps is an outdated mechanism. It will especially fail if two edits happen the same second. Instead, I found parentRevId and editRevId, which correspond to the two parameters mentioned above, but are revision IDs instead.

Unfortunately, this is when using EditPage via action=submit. It seems the API does not support an approach that's also based on revision IDs. You might need to change this.

While digging a little deeper into the code I found something very suspicious. The API parameter starttimestamp will be passed to EditPage as wpStarttime. EditPage does not do anything useful with this parameter, at least not in terms of conflict resolution.

The API's basetimestamp becomes wpEdittime when passed to EditPage. In EditPage this is hopefully similar to a mismatching editRevId. Hopefully. To bad the precision of these timestamps is seconds. This means it's impossible to detect conflicts for edits that happened within the same second. Or, in other words: Don't use timestamps. Use revision IDs.

Another line in EditPage mentions "conflict with self". Here is an idea based on that: Isn't the DiscussionTools doing a section edit? I mean, from the users perspective, shouldn't the tool replicate what the user would do? Aren't users typically doing section edits? What happens if you simply add the section parameter to the tool's API request? The API does support this.

Notes from Engineering meeting yesterday:

  • Option 1: We could make that functionality flaggable upstream. (Add an API parameter to disable it.)
  • Option 2: Only fetch the page again if there is a new revision ID from the API
  • Option 3: Move everything to PHP, then we can fetch the latest version almost immediately before saving.

We liked option 1.

The discussion mentions basetimestamp and starttimestamp. While working on Two-Column-Edit-Conflict-Merge I just learned that relying on timestamps is an outdated mechanism. It will especially fail if two edits happen the same second. Instead, I found parentRevId and editRevId, which correspond to the two parameters mentioned above, but are revision IDs instead.

Unfortunately, this is when using EditPage via action=submit. It seems the API does not support an approach that's also based on revision IDs. You might need to change this.

(…)

That's T34037: API: Allow edit conflict detection based on revid.

Another line in EditPage mentions "conflict with self". Here is an idea based on that: Isn't the DiscussionTools doing a section edit? I mean, from the users perspective, shouldn't the tool replicate what the user would do? Aren't users typically doing section edits? What happens if you simply add the section parameter to the tool's API request? The API does support this.

Not really… Parsoid is involved, which handles sections differently, and in the end we submit the entire page's wikitext to the API. The changes we make should be confined to a single section, but I don't think that makes it a "section edit".

So, T34037 has a patch pending that would allow us to replace basetimestamp with baserevid (https://gerrit.wikimedia.org/r/c/mediawiki/core/+/577345). This is nice but not directly relevant to this task.

However! Apparently edit conflict detection using baserevid would not ignore self-conflicts (T34037#5948430), which would be directly relevant here. @daniel Will we be able to rely on that behavior? Or, would you be open to adding an API parameter to enable/disable this behavior?

@daniel Will we be able to rely on that behavior? Or, would you be open to adding an API parameter to enable/disable this behavior?

Since it's introduced as documented behavior, changing it would be a breaking change. So you can rely on it once it's merged.

I personally thing that ignoring self-conflicts as always a bad idea, and we shouldn't do it, which is why I filed T175745. I'd personally also be open to make this an explicit option. However, whether we need this behavior is a product level decision to be made by the editing team (I suppose). I'd want word from product folks about this before fiddling with the behavior further.

Note that MediaWiki doesn't "fail" to detect the conflict if the edit was made by the same user. It explicitly goes and does an expensive check to see if it should ignore the conflict because the edit is by the same user. This was explicitly implemented as a feature. Don't ask me why though ;)

Change 578368 had a related patch set uploaded (by Bartosz Dziewoński; owner: Bartosz Dziewoński):
[mediawiki/extensions/DiscussionTools@master] Use 'baserevid' instead of 'basetimestamp' for edit conflict detection

https://gerrit.wikimedia.org/r/578368

My proposed patch 578368 depends on @daniel's 577345, and seems to solve the issue for us.

You can test it here: http://patchdemo.wmflabs.org/wikis/c53864c0c3f4011ed481fbd1f1aec935/w/index.php/Talk:Main_Page

@ppelberg I think we should just wait for that mediawiki/core patch to be merged, instead of trying to implement some other solution on our own, hence moving to "Blocked".

Change 578368 merged by jenkins-bot:
[mediawiki/extensions/DiscussionTools@master] Use 'baserevid' instead of 'basetimestamp' for edit conflict detection

https://gerrit.wikimedia.org/r/578368

ppelberg claimed this task.

Note that MediaWiki doesn't "fail" to detect the conflict if the edit was made by the same user. It explicitly goes and does an expensive check to see if it should ignore the conflict because the edit is by the same user. This was explicitly implemented as a feature. Don't ask me why though ;)

Because edit conflicts are annoying. Doubly so if it is actually unneeded. Suppose your Saving timed out and you resent. Or you actually pressed to submit a comment, but while it was saving you noticed of a mistake, so you press Esc to cancel the navigation (it may have gone through or not) and change the textarea (but it turned out that it did save, so you get a conflict with yourself). Or you could be going back to the edit page after having saved it, so it loads the previous form, with the old hidden fields (but it's ok, as what is in the form is actually the current page content).

You think I am the only one having done all of this? T19368 describes exactly that scenario. I think you will get a lot of pushback if you now increase the number of edit conflicts shown to the user.

-1 from me.

If you want to do this, you should first count the number of self-editconflicts being suppressed and check if they would seem like an edit conflict page would have been warranted. In most cases, it probably wouldn't have been desired.

Funny fact: this feature has been there since the first phase3 version, d82c14fb4fbac288b42ca5918b0a72f33ecb1e69.

Because edit conflicts are annoying. Doubly so if it is actually unneeded.

Accidentally reverting yourself because you had two edit tabs open for the same page is also annoying.

Suppose your Saving timed out and you resent.

If saving timed out, your edit wasn't saved, so this case wouldn't be triggered.
If the edit did go through but for some reason you didn't get the response and you re-send, the page content will be the same, so it will be treated as a null edit, and nothing happens. Unless perhaps your edit contained a ~~~~ signature, that would trigger a conflict on the timestamp and show a conflict page. Which you can then just ignore.

Or you actually pressed to submit a comment, but while it was saving you noticed of a mistake, so you press Esc to cancel the navigation (it may have gone through or not) and change the textarea (but it turned out that it did save, so you get a conflict with yourself).

If you hit esc and go back, I expect you'll get the current text (see below). If the edit went through, you can now correct your mistake. If it didn't, you may have lost the comment... or you can go forward to get it again? I didn't test this.

Anyway, cancelling a submit in mid-air is not really a feature that we can support in a consistent way. Though I agree that we should avoid losing unsaved content if possible.

Or you could be going back to the edit page after having saved it, so it loads the previous form, with the old hidden fields (but it's ok, as what is in the form is actually the current page content).

That was apparently the primary use case back in the day. I patched out the code and tried and couldn't make this happen. I couldn't. Browsers now just load the correct version into the text field, even with JS turned off.

You think I am the only one having done all of this? T19368 describes exactly that scenario. I think you will get a lot of pushback if you now increase the number of edit conflicts shown to the user.

I have done all those things, sure. And surprisingly, they still work well enough even without this "feature".
Anyway, silently overriding any changes done by the same user seems like asking for trouble. If we really REALLY need to detect this, we should try to detect edits coming from the same tab/form. That would be a bit more tricky, but would cause less collateral damage.

If you want to do this, you should first count the number of self-editconflicts being suppressed and check if they would seem like an edit conflict page would have been warranted. In most cases, it probably wouldn't have been desired.

Counting is easy, deciding whether it "would have been needed" can only be done manually. Also, I'd also want to know in how many cases this was actually doing damage be overwriting something the user didn't want to have overwritten.

Funny fact: this feature has been there since the first phase3 version, d82c14fb4fbac288b42ca5918b0a72f33ecb1e69.

Yes, because with the behavior of the "back" button in browsers at the time, it was actually needed. It no longer is.

Yes, because with the behavior of the "back" button in browsers at the time, it was actually needed. It no longer is.

This is a very good point. I remember, there was a time, when no browser supported that. Then with some UA (firefox?) it kept your previous text, and there was happiness on the land of the wiki. If we can show this is no longer needed server side, then we certainly can remove it. I suspect the ~~~~ means it won't work, though.

That was apparently the primary use case back in the day. I patched out the code and tried and couldn't make this happen. I couldn't. Browsers now just load the correct version into the text field, even with JS turned off.

I wonder if it was patched correctly. If the browser is restoring the previous content that means the basetimestamp is would be the old one, too. Without the override for self-conflicts, I don't see how it couldn't (in some cases where they would be auto-merged, but often it won't). Try this: add on a page ":ello world! ~~~~". Then go back and add the missing H (typical case where the user notices just after saving).

If we really REALLY need to detect this, we should try to detect edits coming from the same tab/form. That would be a bit more tricky, but would cause less collateral damage.

This seems acceptable to me. I don't know how you could identify the tab that was open, though. I am not aware of any html interface for that. Maybe js could hash the resulting wikitext and store it on a hidden field, then ignore the conflict only if it conflicts with a revision with that wikitext (section edits and PST would make it complex).

Counting is easy, deciding whether it "would have been needed" can only be done manually. Also, I'd also want to know in how many cases this was actually doing damage be overwriting something the user didn't want to have overwritten.

Sure. By knowing which "would have been needed" I expected to get as well the number of those where it was harmful.(in reality, we would probably also have a third category of ones we are not sure about, but it's ok).

In fact, just the number of edit conflicts being suppressed would be a useful metric, since it should be proportional to the number of misguided edit conflicts. I guess we could make a few assumptions:

  • Differences on the current timestamp should count as no difference, as they are probably the result of an automatic substitution.
  • If the second edit contains a large part of the content that was added in the first one (how much is "large"?), we may be quite confident that it is an edit built over the first edit, not over the parent edit.
  • If the result is that the second edit is only changing an unrelated paragraph that should have been merged by diff3, it should not show a conflict.

On the other hand, if the second edit of the conflict removes the block of text added by the prior one, we should consider that a legitimate conflict that should not be suppressed.

My suspicion is that, despite the browser support, we will need to add a condition to suppress timestamp-only self-conflicts. I am not sure if that will be enough, though.

A point that is somewhat behind this is that the edit conflict isn't isn't too user friendly, which makes edit conflicts more burdensome.