Page MenuHomePhabricator

[L] Remove coordinate location from Structured Data without putting the coordinates in the edit summary
Closed, ResolvedPublicFeature

Description

Feature summary (what you would like to be able to do and where):
I would like to be able to remove location information from Structured Data without including that information in the edit summary.

Use case(s) (list the steps that you performed to discover that problem, and describe the actual underlying problem which you want to solve. Do not describe only a solution):
As a Commons administrator, I often have to remove and delete location information that uploaders inadvertently included in their uploaded photo metadata. This location information is usually copied to Structured Data by a bot, so I also have to remove that statement and revdel the previous versions.

Both the normal interface and wbremoveclaims automatically insert the removed coordinates in the edit summary with no override. Using wbremoveclaims I can add more to the edit summary, but can not remove the automatic part.

Benefits (why should this be implemented?):
Currently, I have to remove the coordinate statement from Structured Data, revdel the previous revision text, and then also revdel the summary of my edit. This pretty much doubles the time I spend deleting revisions, and also increases the chances for error.

Uploaders who are unfamiliar with Commons (who are also the most likely to accidentally include coordinate information) also often attempt to remove the coordinate statement themselves. This increases the number of edit summaries I have to revdel, especially if they end up edit warring with a bot.

Event Timeline

AFAICT, Wikibase's AutoCommentFormatter.php automatically adds that summary.
I haven't dug any deeper yet, but I guess there may be a way to override what it does (or create one), but we'll need some more detail in order to know how to implement this:

  • should this not also apply to Wikidata & other Wikibase wikis?
  • if not, why (if at all) should Commons be different?
  • should we simply remove all coordinates from summaries? always? any cases in which we wouldn't want it removed?

if not, why (if at all) should Commons be different?

On Wikidata, coordinates are usually applied to notable places or landmarks. There is no privacy problem there, and it's useful to have the coordinates in the summary so that other editors can quickly see the change made in the edit.

On Commons, coordinates represent the camera location at the time of a photograph. The author and the time the photograph was taken are also included, which combined can be sufficient to de-anonymize the author. Location information is often included inadvertently, and automatically copied to structured data by bots. This is useful when the coordinates represent a landmark, but it is a privacy problem when they represent someone's house.

should we simply remove all coordinates from summaries? always? any cases in which we wouldn't want it removed?

Putting the coordinates in the edit summary when removing is most useful when there were multiple statements for that property. Perhaps the simplest way to implement this would be to say Removed claim: coordinate location (P625) when removing the last claim for a property. That would be similar to how removing multiple claims is treated in WBMI.

Here's how it looks now, for reference:

image.png (270×728 px, 107 KB)

CBogen renamed this task from Remove coordinate location from Structured Data without putting the coordinates in the edit summary to [L] Remove coordinate location from Structured Data without putting the coordinates in the edit summary.Jan 26 2022, 5:54 PM
  1. The relevant part in this summary (that we want removed) comes from Wikibase\Repo\ChangeOp\ChangeOpRemoveStatement::getSummaryArgs; that one must be prevented from returning the summary args (or its direct caller, Wikibase\Repo\ChangeOp\ChangeOpRemoveStatement::apply prevented from using it; further relevant code lacks information about this data that would make this possible without becoming an unstable hack)

Unfortunately, the path down there is rather long:

  • WikibaseRepo.ServiceWiring.php constructs Wikibase\Repo\ChangeOp\ChangeOpFactoryProvider (in "WikibaseRepo.ChangeOpFactoryProvider")
  • ChangeOpFactoryProvider ends up creating and returning a new instance of Wikibase\Repo\ChangeOp\StatementChangeOpFactory (in getStatementChangeOpFactory), but uses a myriad of private properties in order to do so
  • StatementChangeOpFactory creates & returns a new Wikibase\Repo\ChangeOp\ChangeOpRemoveStatement instance (in newRemoveStatementOp)
  • Said ChangeOpRemoveStatement is the one where we could adjust the kind of data that it returns, but actually getting there would require extending from, and in some parts largely duplicating, 3 classes in between. It's also not inherently isolated to MediaInfo entities so would need additional checks to ensure things continue to work when both extensions run side by side.

In short: while not entirely impossible to override that way, this would be a hacky job, and a significant maintenance burden for both projects.

  1. Before being stored, the data generated via above descriptions end up being processed in Wikibase\Repo\SummaryFormatter::formatArg (via Wikibase\Repo\SummaryFormatter::formatAutoSummary). It wouldn't be the ideal place to work with this, but it's a shorter chain of direct object creation, as we could "simply" override whatever end up being created in WikibaseRepo.ServiceWiring.php's "WikibaseRepo.SummaryFormatter", either directly in that class, or by passing it a custom SnakFormatter (of which we have no custom ATM, would probably also be "not trivial" to extend from Wikibase's default and produce a different result in (only) this case) that blanks out the data in this case. Still, that's a pretty massive initialization, and not really a class the lends itself well to being extended.
  1. Changing the summary on display is not an option; AFAICT, core doesn't offer a way to transform the summary beyond the "autocomment" part (the stuff that gets turned into the grey "Removed claim:" part)
  1. MediaWiki runs another hook (onMultiContentSave) pre-save that exposes the summary & allows it to be manipulated. At this point, we have only the formatted comment available without further context, but thanks to the autoformat summary & the coordinates format, we can quite reliably detect both the action (remove statement) and value (something that resembles a coordinate) and update it as needed (remove the coordinate). This is a little hacky, but it's very contained & has minimal risk (worst case is that the comment format changes at some point, at which point coordinates simply stop being removed from the comment, which is what we already have at this point). Given the challenges with other options, this is a good enough solution as far as I'm concerned.

Change 765237 had a related patch set uploaded (by Matthias Mullie; author: Matthias Mullie):

[mediawiki/extensions/WikibaseMediaInfo@master] Remove coordinates from edit summaries when deleting location statements

https://gerrit.wikimedia.org/r/765237

Change 765237 merged by jenkins-bot:

[mediawiki/extensions/WikibaseMediaInfo@master] Remove coordinates from edit summaries when deleting location statements

https://gerrit.wikimedia.org/r/765237

Etonkovidova subscribed.

Checked on commons wmf.7 - when coordinates are removed, the edit summary doesn't include the removed values:

Screen Shot 2022-04-19 at 3.48.21 PM.png (222×2 px, 124 KB)