Page MenuHomePhabricator

Determine workflow to selectively purge potentially privacy-sensitive EXIF fields, such as geocoordinates, from a Wikimedia Commons file
Open, Needs TriagePublic

Description

On another forum, a user reported the following chain of events:

  • Freely-license photo is found online
  • Photo includes EXIF data with GPS coordinates
  • User uploads it to Wikimedia Commons
  • A bot (presumably DschwenBot) extract the GPS from EXIF data and adds a {{Location}} tempalte
  • Such information allows to trace back the original author
  • Uploader removes {{Location}} template
  • Bot adds {{Location}} again
  • Uploader strips EXIF location data, using a third-party tool (“an app”) − unclear whether mobile, desktop, or web-based − and (presumably) reuploads
  • (Unclear whether the older version of the file was revision deleted by a sysop afterwards)

Reactions from other users on that forum seem to indicate that some tooling should solve this issue.

It is unclear to me whether the better workflow would be:

  • preventive − offering at upload time (presumably in UploadWizard − although it is unconfirmed whether this was the upload tool used in this particular case). The drawback of this is that UW is not the only upload method out there.
  • corrective − on an already uploaded file, a one-click "purge geoloc from EXIF" + reupload new version. This could easily be implemented as a web-based Toolforge tool.

As EXIF data is generally appreciated on Wikimedia Commons, it is likely that such a workflow would be considered as potentially open to abuse, this should be factored in the solution (eg access restrictions)

This task should be to determine which of these two workflows (or others I did not think of) is preferable. As this issue is not specific to Wikimedia Commons, it could be helpful to know how other popular image-uploading platforms (Flickr, Unsplash, 500px, etc.) tackle this issue (if at all).

Event Timeline

Restricted Application added subscribers: Liuxinyu970226, Aklapper. · View Herald Transcript

Some preliminary notes regarding an implementation:

  • includes/media/JpegHandler.php swapICCProfile already implements invoking exiftool
  • exiftool supports removing all GPS tags via -gps:all= -- https://www.exiftool.org/geotag.html
  • to be figured out: how to invoke JpegHandler from UploadBase (there's also TransformationalImageHandler.doTransform, but it seems to be related to thumbnail generation)