Page MenuHomePhabricator

Create a Edit group extension
Open, Needs TriagePublic

Description

There's a EditGroups tool, which may track sets of changes on Wikidata items which follow a similar pattern and are performed around the same time by a given user. The tool is specific to Wikidata, but clearly it's nice to have similar tools for other wikimedia wikis, especially Commons. We may make it a MediaWiki extension.

We may add a new table for storing edit groups. Each record of edit group have a unique ID (auto increment), a type (string or a reference to change tag) which is used to differentiate the tool used, a reference of performer, and an (optional?) summary. There'll be some new API function:

  • A new API function to create a new edit group.
  • A new API function to view all edits or log actions involved in an edit group
  • (probably) A new API function to view pages involved in an edit group
  • For all functions that creates new revision or log entry, you can specify the edit group belongs to (probably there should only be at most one edit group per edit, but it can be discussed).

In addition we may have a special page to list all edit groups, and another one to list all edits or log actions involved in an edit group.

Bots and existing tools performing batch edits (AWB, Pywikibot, Cat-a-lot, VisualFileChange, various batch uploading tools, various mass rollback tools, probably some specific function of Twinkle) may be adapted to create edit groups.

Also, we may create a (external, to prevent disrupting the server) tool to mass revert edits made from a edit group (even if they are not newest edits).

Event Timeline

Framawiki renamed this task from edit group mechanism to Create a Edit group extension.Sep 9 2018, 8:47 AM
Framawiki updated the task description. (Show Details)
Framawiki added a subscriber: Framawiki.

The EditGroups external tool now has documentation for developers, explaining the current infrastructure: https://editgroups.readthedocs.io/en/latest/architecture.html.
I would be happy to expand on the points that are unclear or missing.

I agree it would be great to have MediaWiki extension for this, this sort of feature really ought to be supported by the platform itself without the need for an external tool. But that sounds like quite a big project, which does not sound likely to happen soon without some serious investment from professional dev teams (WMF/WMDE). The current external tool can be used to play around and experiment with the possible workflows / features.

Also it would be great to make it possible to run it on other Wikibase instances (at the moment it relies on the Wikimedia EventStream). I would be happy to give guidance and help to anyone who would be interested in working on that.

With the introduction of Structured Data on Commons and the implementation of mass-editing tools there (AC/DC, QuickStatements), the need to generalize EditGroups to other wikis is gathering interest.

My own impression is that on the long term, this functionality would be better implemented in an extension as proposed in this ticket, rather than an external tool relying on recent changes scanning with regular expressions (which is more brittle and vulnerable). For this reason I do not have plans to dedicate significant development efforts to EditGroups in the foreseeable future.

I would be interested to have a conversation with WMF / WMDE about this, to understand if they recognize this as a need that should eventually be catered for closer to MediaWiki:

  • Do you imagine such an extension could be deployed on Wikimedia wikis, if someone develops it to the appropriate quality standards?
  • Would you consider developing such an extension yourself, i.e. would this fall within the scopes of your products/projects?
  • If so, how important do you think it is that mass-edits can be reviewed and reverted easily? In other words, where does this fit in your roadmap compared to other projects?

I think it is important to have that discussion - perhaps the dev team has a different vision about this functionality (having some sort of staging area for mass edits? other community-driven procedures to approve / reject bots? and so on).

Pinging @Lydia_Pintscher for WMDE - who I should ping for WMF?

Lydia_Pintscher added subscribers: Ramsey-WMF, Abit.

Thanks for the ping :) Adding Ramsey and Amanda for the Commons side of things.

I'm definitely not opposed to it. To figure out priorities and if it really needs to be an extension or if there are other ways I'd love to hear what you hope to get out of making it an extension.

Here are a few issues with the current tool:

  • If the tool goes down, this creates millions of dead links in edit summaries (which cannot be changed). Users who ran batches with the assumption that they could be undone if something goes wrong find themselves having to clean things up manually.
  • If the recent changes listener dies for a long time, for instance for longer than the EventStream or recent changes cover, edits which were missed cannot be recovered easily (this could be solved by reading the corresponding edits from the public dumps, but it is not implemented at the moment).
  • As a user, I can easily game EditGroups by imitating the edit summary of any tool, attributing edits to batches they are not actually part of (there are some protections against this, but they are not fully bullet-proof. For instance an edit will only be added to a batch if it was made under the account of the user who first created the batch)
  • Bot/tool authors are reluctant to add grouping support for their bots since EditGroups is not officially part of the Wikidata infrastructure. The community is unlikely to systematically enforce edit grouping in requests for bot approvals as long as it relies on an external tool, which does not come with any sustainability guarantees, SLA, etc.
  • Reverting relies on OAuth, and OAuth tokens can expire while a large edit group is being reverted.

That being said, I am still not sure if all of EditGroups' functionality can reasonably be hoped to be supported in a MediaWiki extension directly: is it doable to have long-running tasks to undo a batch of edits? We do have Special:Nuke which deletes many pages at once, but as this is done synchronously, the number of pages affected might be capped?

That makes sense. Thanks.
I don't know the answer to your questions unfortunately.

In my task description, The extension only records edits. Rollback feature will be provided by an external tool.

Sure, I hope you don't mind us having this discussion here anyway, since this is still at a very early stage. (unless you are already planning to work on this exact architecture?)

From a technical perspective, I wonder what distinguishes edit groups from tags? We already have an API function to create tags (managetags, guarded by the managechangetags right), API functions to view edits / log actions with a certain tag, and a tags parameter on most actions that create new revisions or log entries (not yet complete, see T155109).

Conceptually, edit groups are much more fine-grained than tags: all QuickStatements edits are tagged with the same tag for the one OAuth consumer, but we want to split them into thousands of edit groups. And having thousands of tags, one per edit group, would make Special:Tags rather unwieldy. But I still think we could base much of this on the existing tags architecture.

One option might be to introduce a concept of “sub-tags”, where you’re only allowed to add the tag ACDC/group1234 to edits when the tag ACDC is already being added. (For OAuth consumer tags, that would mean the sub-tag / edit group would be guarded by the usual protection for OAuth CID: * tags, which can’t be applied manually but are automatically added by Extension:OAuth.)

Great point! I did not think about that in this way. It sounds like a very sensible route to follow.

I thought for a moment that there was an issue with the fact that currently, filtering by tags only works for Special:RecentChanges (which only contain the most recent changes, not all of them). But Lucas pointed out that it is also supported by Special:Contributions, which is uncapped in time (I think?). EditGroups currently assumes that all edits in a given group are made by the same user, which I think makes sense as a constraint. So, in the current situation, if you had an edit group which corresponded to a single tag, it would be possible to retrieve all edits in it via Special:Contributions (assuming you know the user in the first place).

This suggests that an approach along the lines of what Lucas describes above is viable. Perhaps "sub-tags" should be attached to a parent user rather than a parent tag (or possibly both) given this observation.