Page MenuHomePhabricator

Create a Edit group extension
Open, LowPublicFeature

Description

There's a EditGroups tool, which may track sets of changes on Wikidata items which follow a similar pattern and are performed around the same time by a given user. The tool is specific to Wikidata, but clearly it's nice to have similar tools for other wikimedia wikis, especially Commons. We may make it a MediaWiki extension.

We may add a new table for storing edit groups. Each record of edit group have a unique ID (auto increment), a type (string or a reference to change tag) which is used to differentiate the tool used, a reference of performer, and an (optional?) summary. There'll be some new API function:

  • A new API function to create a new edit group.
  • A new API function to view all edits or log actions involved in an edit group
  • (probably) A new API function to view pages involved in an edit group
  • For all functions that creates new revision or log entry, you can specify the edit group belongs to (probably there should only be at most one edit group per edit, but it can be discussed).

In addition we may have a special page to list all edit groups, and another one to list all edits or log actions involved in an edit group.

Bots and existing tools performing batch edits (AWB, Pywikibot, Cat-a-lot, VisualFileChange, various batch uploading tools, various mass rollback tools, probably some specific function of Twinkle) may be adapted to create edit groups.

Also, we may create a (external, to prevent disrupting the server) tool to mass revert edits made from a edit group (even if they are not newest edits).

Event Timeline

Framawiki renamed this task from edit group mechanism to Create a Edit group extension.Sep 9 2018, 8:47 AM
Framawiki updated the task description. (Show Details)
Framawiki subscribed.

The EditGroups external tool now has documentation for developers, explaining the current infrastructure: https://editgroups.readthedocs.io/en/latest/architecture.html.
I would be happy to expand on the points that are unclear or missing.

I agree it would be great to have MediaWiki extension for this, this sort of feature really ought to be supported by the platform itself without the need for an external tool. But that sounds like quite a big project, which does not sound likely to happen soon without some serious investment from professional dev teams (WMF/WMDE). The current external tool can be used to play around and experiment with the possible workflows / features.

Also it would be great to make it possible to run it on other Wikibase instances (at the moment it relies on the Wikimedia EventStream). I would be happy to give guidance and help to anyone who would be interested in working on that.

With the introduction of Structured Data on Commons and the implementation of mass-editing tools there (AC/DC, QuickStatements), the need to generalize EditGroups to other wikis is gathering interest.

My own impression is that on the long term, this functionality would be better implemented in an extension as proposed in this ticket, rather than an external tool relying on recent changes scanning with regular expressions (which is more brittle and vulnerable). For this reason I do not have plans to dedicate significant development efforts to EditGroups in the foreseeable future.

I would be interested to have a conversation with WMF / WMDE about this, to understand if they recognize this as a need that should eventually be catered for closer to MediaWiki:

  • Do you imagine such an extension could be deployed on Wikimedia wikis, if someone develops it to the appropriate quality standards?
  • Would you consider developing such an extension yourself, i.e. would this fall within the scopes of your products/projects?
  • If so, how important do you think it is that mass-edits can be reviewed and reverted easily? In other words, where does this fit in your roadmap compared to other projects?

I think it is important to have that discussion - perhaps the dev team has a different vision about this functionality (having some sort of staging area for mass edits? other community-driven procedures to approve / reject bots? and so on).

Pinging @Lydia_Pintscher for WMDE - who I should ping for WMF?

Lydia_Pintscher added subscribers: Ramsey-WMF, Abit.

Thanks for the ping :) Adding Ramsey and Amanda for the Commons side of things.

I'm definitely not opposed to it. To figure out priorities and if it really needs to be an extension or if there are other ways I'd love to hear what you hope to get out of making it an extension.

Here are a few issues with the current tool:

  • If the tool goes down, this creates millions of dead links in edit summaries (which cannot be changed). Users who ran batches with the assumption that they could be undone if something goes wrong find themselves having to clean things up manually.
  • If the recent changes listener dies for a long time, for instance for longer than the EventStream or recent changes cover, edits which were missed cannot be recovered easily (this could be solved by reading the corresponding edits from the public dumps, but it is not implemented at the moment).
  • As a user, I can easily game EditGroups by imitating the edit summary of any tool, attributing edits to batches they are not actually part of (there are some protections against this, but they are not fully bullet-proof. For instance an edit will only be added to a batch if it was made under the account of the user who first created the batch)
  • Bot/tool authors are reluctant to add grouping support for their bots since EditGroups is not officially part of the Wikidata infrastructure. The community is unlikely to systematically enforce edit grouping in requests for bot approvals as long as it relies on an external tool, which does not come with any sustainability guarantees, SLA, etc.
  • Reverting relies on OAuth, and OAuth tokens can expire while a large edit group is being reverted.

That being said, I am still not sure if all of EditGroups' functionality can reasonably be hoped to be supported in a MediaWiki extension directly: is it doable to have long-running tasks to undo a batch of edits? We do have Special:Nuke which deletes many pages at once, but as this is done synchronously, the number of pages affected might be capped?

That makes sense. Thanks.
I don't know the answer to your questions unfortunately.

In my task description, The extension only records edits. Rollback feature will be provided by an external tool.

Sure, I hope you don't mind us having this discussion here anyway, since this is still at a very early stage. (unless you are already planning to work on this exact architecture?)

From a technical perspective, I wonder what distinguishes edit groups from tags? We already have an API function to create tags (managetags, guarded by the managechangetags right), API functions to view edits / log actions with a certain tag, and a tags parameter on most actions that create new revisions or log entries (not yet complete, see T155109).

Conceptually, edit groups are much more fine-grained than tags: all QuickStatements edits are tagged with the same tag for the one OAuth consumer, but we want to split them into thousands of edit groups. And having thousands of tags, one per edit group, would make Special:Tags rather unwieldy. But I still think we could base much of this on the existing tags architecture.

One option might be to introduce a concept of “sub-tags”, where you’re only allowed to add the tag ACDC/group1234 to edits when the tag ACDC is already being added. (For OAuth consumer tags, that would mean the sub-tag / edit group would be guarded by the usual protection for OAuth CID: * tags, which can’t be applied manually but are automatically added by Extension:OAuth.)

Great point! I did not think about that in this way. It sounds like a very sensible route to follow.

I thought for a moment that there was an issue with the fact that currently, filtering by tags only works for Special:RecentChanges (which only contain the most recent changes, not all of them). But Lucas pointed out that it is also supported by Special:Contributions, which is uncapped in time (I think?). EditGroups currently assumes that all edits in a given group are made by the same user, which I think makes sense as a constraint. So, in the current situation, if you had an edit group which corresponded to a single tag, it would be possible to retrieve all edits in it via Special:Contributions (assuming you know the user in the first place).

This suggests that an approach along the lines of what Lucas describes above is viable. Perhaps "sub-tags" should be attached to a parent user rather than a parent tag (or possibly both) given this observation.

A feature request came in: handling changes of username on the wiki. I suspect this is a feature that would likely come "for free" in any reimplementation of the current tool as a MediaWiki extension, because it would rely on the existing SQL tables in MediaWiki to represent users.

This is a good occasion for an update about the status of the current tool. In short, I am not working on it anymore but am available to hand over the maintainership to others.

The most pressing issues are (in my opinion):

  • in the current tool, support ingesting edits via other sources than the Wikimedia Event Stream, so that the tool can be deployed on non-Wikimedia MediaWiki instances;
  • investigate the instability of the Wikimedia Commons instance of this tool;
  • build a MediaWiki extension to replace it, to solve the problems mentioned above.
RPI2026F1 changed the subtype of this task from "Task" to "Feature Request".
RPI2026F1 subscribed.

Quick update on this: undoing edit batches on Wikidata via the current EditGroups has been broken for about two months now and the few hours I have spent trying to fix it have not been sufficient so far.

On the surface it seems related to the instability of the shared Redis instance on Toolforge: T318479 (the timing seems to match, but not sure if it's really the root cause). Discussion about the outage has been happening at Wikidata_talk:Edit_groups and in Project chat.

Arguably it's a pretty bad situation because it leaves bad edits in the wiki (unless people find other ways to undo, but I am not aware of any alternative so far). Since March 25th (approximate start of the outage), people have tried undoing batches 105 times (for 74 distinct batches totaling 93,735 edits) in EditGroups and it only succeeded for three of those batches.

Beyond the long-term approach proposed by this ticket (which I wholeheartedly support), there would be room for other approaches:

  • one could try to fix the current tool, for instance by migrating it to a Cloud VPS instance (as sketched in Wikidata_talk:Edit_groups)
  • one could develop a separate tool (user script? other toolforge tool? command-line tool?) to undo a batch given its id. Given that the current EditGroups tool is still able to track edits and list them via an API, it could perhaps still be used for this purpose, with the auxiliary tool only having to issue a series of undo API requests for each edit in the batch.
  • or perhaps a more generic tool which would be able to undo all edits matching certain conditions (not necessarily coupled to the batch ids)?

Given that the presence of the tool gives the false impression that batches can be undone, I also wonder if any steps should be taken to advertise the problem to the wider community, adding notices in batch editing tools to warn people that their imports cannot be undone easily anymore, and so on? I am not sure it would encourage anyone to change anything in what they are doing though.

I would welcome any opinion on this problem, whether or not it is accompanied by more concrete help.

one could try to fix the current tool, for instance by migrating it to a Cloud VPS instance (as sketched in Wikidata_talk:Edit_groups)

A simpler alternative to get a dedicated Redis instance could be T360378: Provide a Redis container for use within a tool's namespace.