Page MenuHomePhabricator

Copyvio tools for Commons
Open, Needs TriagePublic

Description

Copyright on Commons is a topic that is at times, subject to abuse. Some images are left for long periods of time because people are afraid to tag them, and other times images are subject to the copyright whims of admins. It would be helpful if there were tools to both help detect possible violations (there are perhaps databases we can match things up against?) and a better method for screening images once detected.

This card tracks a proposal from the 2015 Community Wishlist Survey: https://meta.wikimedia.org/wiki/2015_Community_Wishlist_Survey

This proposal received 38 support votes, and was ranked #26 out of 107 proposals. https://meta.wikimedia.org/wiki/2015_Community_Wishlist_Survey/Commons#Copyvio_tools_for_Commons

Related Objects

StatusAssignedTask
Resolvedmatthiasmullie
OpenNone
OpenNone
OpenNone
OpenNone
ResolvedSn1per
ResolvedSn1per
ResolvedPrtksxna
DeclinedNone
Resolvedmatthiasmullie
Resolvedmatthiasmullie
Resolvedmatmarex
Resolvedmatmarex
Resolvedmatmarex
Resolvedmatmarex
Resolvedmatmarex
Resolvedmatmarex
ResolvedCenarium
Resolvedmatthiasmullie
ResolvedMarostegui
OpenNone
Resolvedmatthiasmullie
Resolvedmatmarex
OpenNone
ResolvedCenarium
ResolvedCenarium
OpenNone
OpenMarkTraceur
Openeranroz
ResolvedSamwilson
OpenNone

Event Timeline

DannyH created this task.Dec 4 2015, 10:25 PM
DannyH raised the priority of this task from to Needs Triage.
DannyH updated the task description. (Show Details)
DannyH moved this task to Wishlist 51-on on the Community-Wishlist-Survey-2015 board.
DannyH added a subscriber: DannyH.
Restricted Application added subscribers: StudiesWorld, Steinsplitter, Aklapper. · View Herald TranscriptDec 4 2015, 10:25 PM
Reedy set Security to None.
Steinsplitter moved this task from Incoming to Backlog on the Commons board.Dec 8 2015, 8:59 AM

Some specific ideas from the wishlist page:

I'm wondering if using google image search at the time of upload would be useful for creating an autodetection method? We would also need to build a whitelist to reduce false positives. --@Doc_James

I think a good first step would be use perceptual hashing to see if a similar image was previously deleted. I imagine lots of copyvios are uploaded again and again. --@Bawolff

We could add new filters to [[c:Special:NewFiles]] tool so Commons users can browse for new uploads that are:

  • from new users (we already have newbie-upload tool),
  • from users with a lot of recent deletions,
  • do not have EXIF data,
  • are small,
  • known to google image search,
  • were deleted before
  • do or do not claim {{own}} ("own work")
  • do or do not use [[c:template:Custom license]]

etc. All those factors increase a chance that an image is a Copyvio and it would be nice if we could add and remove those filters in any combination. --@Jarekt

See T31793 for an update, all theoretical right now

IMPORTANT: If you are a community developer interested in working on this task: The Wikimedia Hackathon 2016 (Jerusalem, March 31 - April 3) focuses on #Community-Wishlist-Survey projects. There is some budget for sponsoring volunteer developers. THE DEADLINE TO REQUEST TRAVEL SPONSORSHIP IS TODAY, JANUARY 21. Exceptions can be made for developers focusing on Community Wishlist projects until the end of Sunday 24, but not beyond. If you or someone you know is interested, please REGISTER NOW.
DannyH updated the task description. (Show Details)Feb 5 2016, 11:57 PM
Poyekhali triaged this task as Normal priority.Apr 13 2016, 5:06 AM
Steinsplitter added a comment.EditedApr 13 2016, 11:49 AM

Pokefan95 triaged this task as "Normal" priority.

@Pokefan95 Are you planning to code the tool or why you set the priority?

Steinsplitter raised the priority of this task from Normal to Needs Triage.Apr 16 2016, 10:49 AM
Gunnex added a subscriber: Gunnex.Apr 18 2016, 4:18 PM
Restricted Application added a project: Community-Tech. · View Herald TranscriptJun 20 2018, 2:37 PM
MusikAnimal added a subscriber: MusikAnimal.

I don't think Earwig's Copyvios tool was built to search imagery and other media. Neither is CopyPatrol, hence removing the tag.