Page MenuHomePhabricator

Create edit groups when running Wikidata-related scripts
Open, LowPublic

Description

See https://www.wikidata.org/wiki/Wikidata:Edit_groups. It requires adding a string in edit summaries.


Wikidata is the database behind Wikipedia. Pywikibot has a scripts like scripts/newitem.py that create and edit items in this database. Most of them (should) use WikidataBot class present in pywikibot/bot.py.

This task is to

  1. add a config parameter in config2.py with a small explanation in comment
  2. if the bot runs on wikidata (self.site == pywikibot.Site('wikidata', 'wikidata')), generate a random hexadecimal 10 char string at its initialization (in __init__)
  3. if summary is defined (if 'summary' in kwargs), use this value as a suffix of kwargs['summary'] value in WikidataBot.user_edit_entity()

The kwargs['summary'] value should look after processing like my very informative edit summary ([[:toollabs:editgroups/b/CB/89ead4fe|details]]), where the my very informative edit summary part is the old kwargs['summary'], and ([[:toollabs:editgroups/b/CB/89ead4fe|details]]) the suffix added with the random string in it. Note that the [[ ]] will create a like to an external tool.

This should add the prefix in the summary of item creation and claim addition. More work will be needed for adding it to qualifier and reference edits (will wait for T112577, out of scope for now).
See the "For custom bots" section of https://www.wikidata.org/wiki/Wikidata:Edit_groups/Adding_a_tool.

Event Timeline

Xqt triaged this task as Low priority.Jul 29 2018, 9:44 AM
Framawiki updated the task description. (Show Details)

Will mentor this task for Google-Code-in-2018 with whoever wishes.

Please review my step-by-step added in the task desc.

What is this?

editentity can support having a complete JSON blob being saved. There is no need to do multiple edits, and it is rather bad that any script does multiple edits to the same item. Lazy programming.

Removing from GCI until it is very clear how this should proceed.

Perhaps a different script might be more sensible for this task.

newitem.py should do one edit only, unless there is some very good reason for splitting the creation into separate commits to Wikibase.

Also if edit groups are now a thing that the community wants, it should be supported by functionality in core , that can be reused by any scripts.

@jayvdb mmm I think that there are a misunderstanding here. Let me quote the introduction of https://www.wikidata.org/wiki/Wikidata:Edit_groups :

"Edit groups" are sets of changes on Wikidata items which follow a similar pattern and are performed around the same time by a given user. They are typically produced by bots or humans performing semi-automated changes across many items. MediaWiki (the software that powers Wikidata) does not have such a notion of edit groups, but external tools such as EditGroups can be used to track these changes. (..) When you perform automated edits with tools such as HarvestTemplates, QuickStatements or OpenRefine, each batch of edits gets a unique identifier and a page on the EditGroups tool.

So this task is to have pywikibot be compatible with https://tools.wmflabs.org/editgroups tool. It allows to simply see in one unique page the work done in a set of edits (random example), and if needed to in one click undo grouped edits made on multiple items.
Technically this tool works by looking all edit summaries, to detect a unique value, that corresponds to a group, using a regex.
So this task is to add this unique string in the summary of edits made in the same batch.

It might be worth giving the bot author some control over this feature:

  • there should be some opt-in / opt-out mechanism
  • there should be some control over what constitutes a batch. Some users might want to create multiple logical batches during the same run of a bot, or share the same batch id across consecutive runs of the same python script (for instance if it is called by a bash script…

Note that this EditGroups tool is not part of mediawiki and can break in various ways - for instance, if it is down, the links in the summaries will be dead.
Also, this tool is specific to Wikidata - it will not make any sense to add these links if the MediaWiki instance is not Wikidata.
Generalizing the tool to other Wikibase instances is possible but this will induce a different syntax for the edit summaries, so this should probably be configurable in some way.

As a long-term solution I propose to build a similar tool in MediaWiki, see T203557: Create a Edit group extension.

@Pintoch I've updated the task description according to your comment, thanks you. Note that the GCI part of this task is only the minimum implementation.

Would it be possible to ask the user of the tool to enter a one line natural language text describing the edit group? That would help others understand what the edit group did when looking here: https://tools.wmflabs.org/editgroups/