Page MenuHomePhabricator

Automatically check Commons uploads for possible copyright violations
Open, HighPublic

Description

We are exploring some options for automatic detection of files uploaded from somewhere else. This bot will only run on Commons, on files where the source is "own work", and where the uploader is not "trusted" (to be determined what that means, exactly)

Approach to detect possible copyright violations:

  • T31793 - google search for new uploaded files (or other providers?)
  • T230561 - create a model to score files require additional attention for copyvio aspects

Event Timeline

MarkTraceur claimed this task.
MarkTraceur raised the priority of this task from to Normal.
MarkTraceur updated the task description. (Show Details)
MarkTraceur added a subscriber: MarkTraceur.
Restricted Application added subscribers: Steinsplitter, Aklapper. · View Herald TranscriptJan 13 2016, 4:25 PM
Restricted Application added a subscriber: Matanya. · View Herald TranscriptJan 13 2016, 4:25 PM
JEumerus added a subscriber: JEumerus.
JEumerus removed a subscriber: JEumerus.
JEumerus added a subscriber: JEumerus.
Steinsplitter moved this task from Incoming to Uploading on the Commons board.Jan 22 2016, 6:01 PM
Steinsplitter awarded a token.
Gunnex added a subscriber: Gunnex.Jan 25 2016, 9:33 AM
Yann added a subscriber: Yann.Feb 8 2016, 4:51 PM

We are exploring some options for automatic detection of files uploaded from somewhere else.

@MarkTraceur: Has that exploration happened in the last five months, and what was the outcome?
If not, does anyone plan to work on this soon (or should the priority be changed?
Thanks!

MarkTraceur lowered the priority of this task from Normal to Low.Jun 14 2016, 1:42 PM

@Aklapper Sorry about that, yeah, this is on the back burner because we have no good leads on partnerships with services that could provide image checking for us. Google has said in no uncertain terms that they don't have an open API for this, and I think partnerships are being negotiated with some other system, but are not finalized. In any case, it's blocked internally and low priority for us currently.

@Aklapper Sorry about that, yeah, this is on the back burner because we have no good leads on partnerships with services that could provide image checking for us. Google has said in no uncertain terms that they don't have an open API for this, and I think partnerships are being negotiated with some other system, but are not finalized. In any case, it's blocked internally and low priority for us currently.

Noted on AN: https://commons.wikimedia.org/w/index.php?title=Commons:Administrators%27_noticeboard&diff=198996743&oldid=198967192

I suggest different approach that is independent on external service (this may be compementary way):

  • Train ML classfier of "copyright violation"
    • Define features relevant to guess whether a file is at high risk for copyright violation (metadata - user properties/history, image description, EXIF, logo like in the corner etc)
    • Get data from commons on many files to gather statistics on those features and train classifier
  • Predict/score new uploads
    • Score new uploads based soely on the upload itself (rather than external service)

@Halfak @Ladsgroup is it something thatalready planned in ORES?

ABorbaWMF moved this task from Desired epics to Needs QA on the Multimedia board.Aug 30 2017, 12:42 AM
ABorbaWMF moved this task from Needs QA to Desired epics on the Multimedia board.
ABorbaWMF added a subscriber: ABorbaWMF.

Dragged and dropped by mistake. Putting this ticket back.

eranroz renamed this task from Create a bot to automatically check Commons uploads for possible copyright violations to Automatically check Commons uploads for possible copyright violations.Aug 15 2019, 3:49 PM
eranroz raised the priority of this task from Low to High.
eranroz updated the task description. (Show Details)
Restricted Application added a project: Community-Tech. · View Herald TranscriptAug 15 2019, 3:49 PM

updated the description to fit to other subtasks (and aligning priority to T31793)