Page MenuHomePhabricator

Add method to Revision to check if it was a Revert, and whether an edit was Reverted
Closed, ResolvedPublic

Description

There are various ways to revert a revision. It would be useful to be able to detect when revisions have occurred and surface this in things like the recent changes feed. One use case is to detect edit wars in the trending service that is currently be worked on.

It seems we can detect reverts occurring (in Echo) so there must be a way to encapsulate this logic.

Manual reverts are out of scope for the purpose of this task (EDIT: Other teams may want this, though)

Tentative list of non-manual scenarios to handle:
a. Edit E-1 is made, edit U-1 undoes it without intervening changes (no conflict resolution or manual edit). E-1 is Reverted, U-1 is a Revert
b. Edit E-1 is made, edit RO-1 undos it. E-1 is Reverted, RO-1 is a Revert.
c. Edit E-1 is made, unrelated E-2 is made, edit U-1 undoes E-1 (intervening change, conflict resolution, no manual edit). E-1 is Reverted, U-1 is a Revert (even though U-1 does not match any prior revision), E-2 is neither.
d. Edit E-1, E-2, and E-3 are made by the same user, edit RO-1 undoes all. E-1, E-2, and E-3 are all Reverted, RO-1 is a Revert.

Manual scenarios (TBD if we want to handle):
e. Base is B-1. Edit E-1 is made, manual edit M-1 (made without undo or rollback) sets it back to B-1 text. It's an exact revert, but not using tools. E-1 is Reverted, M-1 is a Revert.
f. Base is B-1. Edit E-1 and E-2 are made. Manual edit M-1 sets it back to B-1 text. It's an exact multi-revision revert, not using core functionality (though it might use a user script). E-1 and E-2 are Reverted, M-1 is a Revert.

Related Objects

StatusSubtypeAssignedTask
ResolvedSBisson
ResolvedMMiller_WMF
StalledNone
DeclinedNone
ResolvedOstrzyciel
ResolvedOstrzyciel
ResolvedOstrzyciel
ResolvedOstrzyciel
ResolvedOstrzyciel
ResolvedOstrzyciel
ResolvedOstrzyciel
ResolvedOstrzyciel
ResolvedOstrzyciel
ResolvedOstrzyciel
ResolvedOstrzyciel
ResolvedOstrzyciel
ResolvedOstrzyciel
ResolvedOstrzyciel
ResolvedOstrzyciel

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Except for rollback, the user can arbitrarily edit the text. Should pressing undo + editing be still counted as a revert?

I'd like to have a log item that captures what my Revert objects do.

Does that help with providing some sort of Revision::isRevert() or is that a separate feature request?

Should pressing undo + editing be still counted as a revert?

No, IMHO. In my opinion, if this is tracked formally, it should be whether it's changed to be identical to a previous version, regardless of how it's done (manually, undo without changes just after the edit, rollback, just clicking edit on an oldid then save, etc.). Basically SHA (but only if the index is added).

Note 'undo' has conflict resolution, so even if you don't edit, it may not match any prior revision. (A->B->C, undo B, if conflict resolution succeeds creates D not matching any prior revision)

isUndo would also be useful (separately), but then you have to decide whether to still count it as an undo if they make manual changes.

I believe Echo just triggers off of whether the "undo" link was used at the time the edit is made.

Also catches 'rollback', but correct.

Note that with rMW85cabc80e92c (https://gerrit.wikimedia.org/r/#/c/329651/), we should only need to check the revision's sha1 against the base revision's sha1.

A Revert object is a good idea, I think I'll implement this in a new version of https://gerrit.wikimedia.org/r/#/c/329651/.
This should be kept separate from the Revision object though, a revision shouldn't be aware of the Revert object, a Revision object is standalone and shouldn't depend on the page history (revisions can be moved around for instance). So I don't think we should have a method on Revision to indicate if it's a revert, but the goals underlying this task can be accomplished without.

Whether we want exact reverts only or not depends on the use case, for example in Echo we should still notify of non-exact reverts, but we should abort sending the new links notification for exact reverts only. If we want to make a study of reverts, it depends on the goals.

Echo also has this problem of not being able to detect reverts for the new links notification, see EchoHooks::onLinksUpdateAfterInsert().

Also, the sha1 is not enough to check for reverts since revisions with different content models may have the same text.

If we want a revert table, we would need four fields: revert_id (new revision id), revert_revertedid (reverted revision id), revert_restoredid (restored revision id) and revert_exact (whether the contents of the new revision match the content of the restored revision, which would require loading the restored revision in a deferred update on save).

Correct me if I'm wrong, but this seems like a good fit for the platform team?

It might be a good fit for Edit-Review-Improvements too since they are working on this functionality now. Maybe in collab with MediaWiki-Platform-Team.

Note that with rMW85cabc80e92c (https://gerrit.wikimedia.org/r/#/c/329651/), we should only need to check the revision's sha1 against the base revision's sha1.

If I understand correctly, this wouldn't catch manual revert scenarios, though.

E.g. base revision is B, someone makes edit E1. Someone then makes (without using undo or rollback) edit E2. The text of E2 is identical to B, so this is a revert. The base revision (using editRevId) when making E2 is E1.

In general, this needs to be speced better. I've put some suggestions in the task.

Mattflaschen-WMF renamed this task from Add method to Revision to check if it was a revert to Add method to Revision to check if it was a Revert, and whether an edit was Reverted.Aug 23 2017, 9:14 PM
Mattflaschen-WMF updated the task description. (Show Details)

Are the shortcuts M/U/RO supposed to mean the user clicked edit/undo/rollback respectively? Note that you can click undo and still make arbitrary manual changes. From an edit history point of view it's not really different from a normal edit, but recording what kind of user interaction led to the edit and use it as context information for detecting reverts is always a possibility.

Just an FYI: Analytics folks have succeeded in building logic (that compares SHAs) that detects whether revisions are reverts in the entire MW history: https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Edits/Mediawiki_history (see the *revert* fields in that table).

This isn't useful for the use cases that this feature request would solve, but it is at least relevant to it. :)

Note that this should not be added to the revision class, which is deprecated (cf T246284: Hard deprecate the Revision class)

A great usecase of this would be to allow showing a collapsed view of the history page. Currently, you have some page histories where it almost hasn't changed for years, yet there are a lot of history entries due to edits and reverts/undos.
It would be great to have them collapsed as a single link mentioning there were 53 irrelevant edits, so that only actual changes affecting the current page are shown.

(Paging that efficiently in SQL is probably non-trivial, but just being able to do that with a js gadget, even if it results in history pages of varying entry count would be a big step)

Ostrzyciel claimed this task.
Ostrzyciel subscribed.

I'm marking this as resolved, as the rough goals outlined here have been met.

  • The PageSaveComplete hook passes the EditResult object to extensions that indicates whether the edit was reverted and what kind of revert it was.
  • Manual reverts are detected automatically and marked appropriately.
  • Reverted revisions can be either obtained from the EditResult object of a revert or by looking at change tags in the database. mw-reverted tag indicates that a revision was reverted.

More information in the docs: https://www.mediawiki.org/wiki/Manual:Reverts

Yeah this is great stuff thank youuuuu!

Really great work!

One possible follow-up would IMO be to make EditResultBuilder::findIdenticalRevision operate on a content level instead of a revision level. The current logic works well for now, but once extensions with MCR support become more common, we'll probably want manual revert detection to extend to scenarios where user#1 adds a paragraph to the main article text, then user#2 changes an MCR field (e.g. does an article quality evaluation), then user#3 removes the paragraph added by user#1. That would create a revision that has never existed before, even though one of the content slots is unchanged and the other is reverted to a preexisting value. To detect that as a manual revert, the sha1 check would have to be extended to the content table.