Page MenuHomePhabricator

Add method to Revision to check if it was a Revert, and whether an edit was Reverted
Open, LowPublic

Description

There are various ways to revert a revision. It would be useful to be able to detect when revisions have occurred and surface this in things like the recent changes feed. One use case is to detect edit wars in the trending service that is currently be worked on.

It seems we can detect reverts occurring (in Echo) so there must be a way to encapsulate this logic.

Manual reverts are out of scope for the purpose of this task (EDIT: Other teams may want this, though)

Tentative list of non-manual scenarios to handle:
a. Edit E-1 is made, edit U-1 undoes it without intervening changes (no conflict resolution or manual edit). E-1 is Reverted, U-1 is a Revert
b. Edit E-1 is made, edit RO-1 undos it. E-1 is Reverted, RO-1 is a Revert.
c. Edit E-1 is made, unrelated E-2 is made, edit U-1 undoes E-1 (intervening change, conflict resolution, no manual edit). E-1 is Reverted, U-1 is a Revert (even though U-1 does not match any prior revision), E-2 is neither.
d. Edit E-1, E-2, and E-3 are made by the same user, edit RO-1 undoes all. E-1, E-2, and E-3 are all Reverted, RO-1 is a Revert.

Manual scenarios (TBD if we want to handle):
e. Base is B-1. Edit E-1 is made, manual edit M-1 (made without undo or rollback) sets it back to B-1 text. It's an exact revert, but not using tools. E-1 is Reverted, M-1 is a Revert.
f. Base is B-1. Edit E-1 and E-2 are made. Manual edit M-1 sets it back to B-1 text. It's an exact multi-revision revert, not using core functionality (though it might use a user script). E-1 and E-2 are Reverted, M-1 is a Revert.

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptDec 5 2016, 6:42 PM

IIRC we can detect them in Echo because we hook functions running outside the scope of a revision, but that sort of context is lost as soon as the request has finished being handled. There's no good (quick + trustworthy) way to guess whether a given revision ID was a revert, we don't store the information.

AlexMonk-WMF added a comment.EditedDec 5 2016, 6:51 PM

Also I'd be careful about trying to invent such a field (to store whether a given revision was a revert). While you could do it for rollbacks, when you press the undo link you can still change the text to whatever you want. Other humans probably usually identify a revision as a revert by looking for the auto-generated (but entirely customisable) revision summary and trusting the user who made it. To find a simple single-revision revert you can compare the text/hash of your revision to the parent of the parent's text/hash, but beyond that things could get complicated.

Tgr added a subscriber: Tgr.Dec 5 2016, 6:59 PM

There are roughly three ways to revert (use the rollback link/API, use the undo link and save without changes, open an old revision for editing and save without changes - see enwiki help page for details). You probably mean rollback?

Anomie added a subscriber: Anomie.Dec 5 2016, 7:01 PM

I believe Echo just triggers off of whether the "undo" link was used at the time the edit is made. To detect reverts after the fact, I can think of a few options:

  • Check the hashes of previous revisions of the article for a match. Disadvantage: Doesn't match what Echo does. It'll detect manual reverts that don't use the "undo" link but not reverts where the reverted-to wikitext was edited before being saved or where the topmost revision wasn't reverted.
  • Add a flag to the database to store it somehow. Disadvantage: Requires a schema change.
  • Add a change tag to reverts. Disadvantage: Communities might see it as clutter, since tags show up in the UI, and want it turned off on some wikis. Won't be present on old revisions.
Jdforrester-WMF triaged this task as Low priority.Dec 5 2016, 7:04 PM
Jdforrester-WMF added a project: Epic.
Tgr added a comment.Dec 5 2016, 7:09 PM

Note that anomalous pages can have tens of thousands of revisions and there is no index on rev_sha1 so real-time checking for SHA1 matches is not an option (without an index change). OTOH it seems unlikely that someone would revert to something other than the last dozen or so revisions.

If you want non-trivial revert detection that does not have to happen in real-time, ORES seems to be the right framework to put it in.

See also T5640: Mark edits that were {reverted/rollbacked} (in the logs history/contribs pages and data dumps) and Research:Revert.

Anomie added a comment.Dec 5 2016, 7:11 PM

Note that anomalous pages can have tens of thousands of revisions and there is no index on rev_sha1 so real-time checking for SHA1 matches is not an option (without an index change).

See also T51138: add index to revisions.rev_sha1.

This is what I mean.. I see lots of evidence of working around this problem ... I've written similar scripts myself... so why isn't it formalised? It seems like a core part of a wiki to me.

Agreed. I'd like to have a log item that captures what my Revert objects do.

  • reverting: int -- the revision ID of the reverting edit
  • reverteds: list of int -- the revision IDs with changes removed in the revert
  • reverted_to: int -- the revision ID in the past that exactly matches the new state of the page (if applicable)
Tgr added a comment.Dec 14 2016, 11:12 PM

Except for rollback, the user can arbitrarily edit the text. Should pressing undo + editing be still counted as a revert?

I'd like to have a log item that captures what my Revert objects do.

Does that help with providing some sort of Revision::isRevert() or is that a separate feature request?

Should pressing undo + editing be still counted as a revert?

No, IMHO. In my opinion, if this is tracked formally, it should be whether it's changed to be identical to a previous version, regardless of how it's done (manually, undo without changes just after the edit, rollback, just clicking edit on an oldid then save, etc.). Basically SHA (but only if the index is added).

Note 'undo' has conflict resolution, so even if you don't edit, it may not match any prior revision. (A->B->C, undo B, if conflict resolution succeeds creates D not matching any prior revision)

isUndo would also be useful (separately), but then you have to decide whether to still count it as an undo if they make manual changes.

I believe Echo just triggers off of whether the "undo" link was used at the time the edit is made.

Also catches 'rollback', but correct.

Cenarium added a subscriber: Cenarium.EditedJan 11 2017, 3:20 AM

Note that with rMW85cabc80e92c67918941c903298d667aa34ea4de (https://gerrit.wikimedia.org/r/#/c/329651/), we should only need to check the revision's sha1 against the base revision's sha1.

Jdlrobson moved this task from Backlog to Tracking on the Trending-Service board.

A Revert object is a good idea, I think I'll implement this in a new version of https://gerrit.wikimedia.org/r/#/c/329651/.
This should be kept separate from the Revision object though, a revision shouldn't be aware of the Revert object, a Revision object is standalone and shouldn't depend on the page history (revisions can be moved around for instance). So I don't think we should have a method on Revision to indicate if it's a revert, but the goals underlying this task can be accomplished without.

Whether we want exact reverts only or not depends on the use case, for example in Echo we should still notify of non-exact reverts, but we should abort sending the new links notification for exact reverts only. If we want to make a study of reverts, it depends on the goals.

Echo also has this problem of not being able to detect reverts for the new links notification, see EchoHooks::onLinksUpdateAfterInsert().

Also, the sha1 is not enough to check for reverts since revisions with different content models may have the same text.

If we want a revert table, we would need four fields: revert_id (new revision id), revert_revertedid (reverted revision id), revert_restoredid (restored revision id) and revert_exact (whether the contents of the new revision match the content of the restored revision, which would require loading the restored revision in a deferred update on save).

Halfak added a comment.Feb 9 2017, 4:39 PM

See also https://en.wikipedia.org/wiki/Wikipedia:Village_pump_(idea_lab)#New_label_for_edits_in_Recent_changes -- a proposal for adding an rv tag for reverted edits in the RecentChanges feed.

Looks like this was missed due to having Developer-Wishlist and not Developer-Wishlist (2017). Sorry :(

Adding a new link to an archived proposal: New label for edits in Recent changes.

Correct me if I'm wrong, but this seems like a good fit for the platform team?

Halfak added a comment.Apr 4 2017, 2:03 PM

It might be a good fit for Edit-Review-Improvements too since they are working on this functionality now. Maybe in collab with MediaWiki-Platform-Team.

He7d3r added a subscriber: He7d3r.Aug 4 2017, 5:53 PM

Note that with rMW85cabc80e92c67918941c903298d667aa34ea4de (https://gerrit.wikimedia.org/r/#/c/329651/), we should only need to check the revision's sha1 against the base revision's sha1.

If I understand correctly, this wouldn't catch manual revert scenarios, though.

E.g. base revision is B, someone makes edit E1. Someone then makes (without using undo or rollback) edit E2. The text of E2 is identical to B, so this is a revert. The base revision (using editRevId) when making E2 is E1.

In general, this needs to be speced better. I've put some suggestions in the task.

Mattflaschen-WMF renamed this task from Add method to Revision to check if it was a revert to Add method to Revision to check if it was a Revert, and whether an edit was Reverted.Aug 23 2017, 9:14 PM
Mattflaschen-WMF updated the task description. (Show Details)
Tgr added a comment.Aug 23 2017, 9:43 PM

Are the shortcuts M/U/RO supposed to mean the user clicked edit/undo/rollback respectively? Note that you can click undo and still make arbitrary manual changes. From an edit history point of view it's not really different from a normal edit, but recording what kind of user interaction led to the edit and use it as context information for detecting reverts is always a possibility.

Just an FYI: Analytics folks have succeeded in building logic (that compares SHAs) that detects whether revisions are reverts in the entire MW history: https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Edits/Mediawiki_history (see the *revert* fields in that table).

This isn't useful for the use cases that this feature request would solve, but it is at least relevant to it. :)