Make limited information from filearchive available to everyone
OpenPublic

Description

Original bug title:
Make limited information from filearchive available to everyone

Reasoning:
When it comes to identifying copyright violations and [[WP:Sock puppetry]], it is essentially helpful if you can check whether a file has been previously uploaded without uploading the file into the stash yourself.

Demand:
title and size, filterable by sha1
( fasha1=HEXHASH&faprop=title|size )

What about privacy?
Not an issue. If you upload to a file to the stash, you are able to obtain this information anyway.


Version: 1.23.0
Severity: enhancement
See Also:
https://bugzilla.wikimedia.org/show_bug.cgi?id=57697

bzimport added a project: MediaWiki-API.Via ConduitNov 22 2014, 2:28 AM
bzimport added a subscriber: wikibugs-l.
bzimport set Reference to bz58993.
Rillke created this task.Via LegacyDec 27 2013, 3:31 PM
LuisV_WMF added a comment.Via ConduitDec 30 2013, 9:59 PM

Dumb question: what's the use case for this? (Ideally I'd also like to understand the use case for the existing functionality as well, but one thing at a time...)

Rillke added a comment.Via ConduitDec 31 2013, 2:43 AM

(In reply to comment #1)

Dumb question: what's the use case for this?

see Reasoning. +Let me give you 3 examples:

User uploads copyright violation. Patroller marks file for deletion. Admin deletes file. User uploads same file again. Patroller can now sha1lookup whether a similar file did exist before at https://commons.wikimedia.org/w/index.php?title=Commons:User_scripts/File_Analyzer&withJS=MediaWiki:FileAnalyzer.js
and identify the user(s) who uploaded that file.

Bot coder and bot are not administrators. Bot uploads a batch of very huge files. But some were previously deleted and should not be uploaded again. Bot could check SHA1 before uploading to save bandwidth.

File is marked for transfer from en.wikipedia to Commons. Bot/Tool could check whether this file was previously deleted at Commons and refuse the transfer.

...
Please let me know if this was convincing enough or whether you would like to get more feedback from Commons users. Or are you asking for a technical explanation of SHA1 and that kind of stuff? Sorry, here at bugzilla, it's always a bit difficult to get it right because I never know to whom I am talking without googleing.

Aklapper added a comment.Via ConduitDec 31 2013, 10:32 AM

(In reply to comment #2)

I never know to whom I am talking without googleing.

I've bookmarked https://wikimediafoundation.org/wiki/Staff?showall=1 for that :)

LuisV_WMF added a comment.Via ConduitDec 31 2013, 6:46 PM

Examples were perfect, thanks - understand the use case much better now.

I'm fine with this from a privacy perspective, as long as it respects suppression of titles (which should also be respected if you do a full file upload - I understand that isn't currently the case, have filed bug 59167 for that.)

[Also, I've tweaked my settings to say a little bit about who I am, hope that helps (though I suppose that might make you *more* likely to explain SHA1, which I definitely don't need!) ]

Fae added a comment.Via ConduitMay 6 2014, 10:53 AM

From a (non-sysop) bot writing perspective, it would be great to be able to get an array of previous deletions for an queried SHA-1. At the moment pywikipediabot passes back a name of a matching file, but not all matches.

I suggest that the deleted file names are passed back (incredibly useful info when these contain reference numbers from the original source, such as Flickr photo ids) *unless* there were a reason to suppress the filename from the deletion log. Other basic information (dates, uploader, editors) would be great for a bot to take action on, or make decisions about. Scenarios include a bot taking different actions based on whether it sees its name as a past uploader or whether upload dates fall within the dates of a recent batch upload project.

There may be privacy issues on some data elements (such as listing all past editors or uploaders), however I think we should expect to be able to automatically distinguish between ordinary deleted material (such as copyvios) and files which were deleted due to respect/privacy concerns.

silke added a comment.Via ConduitJun 11 2014, 5:33 PM

Not related to 58791, removing dependency.

Rillke claimed this task.Via WebFeb 13 2015, 7:06 PM

Simply claiming this task to get some kind of todo list; if someone else beats me, please just take this task!

Anomie moved this task to Needs Code on the MediaWiki-API workboard.Via WebFeb 19 2015, 6:46 PM
Rillke added a project: Commons.Via WebMar 10 2015, 5:30 PM
Rillke set Security to None.
Steinsplitter moved this task to Backlog on the Commons workboard.Via WebMar 11 2015, 12:52 PM
silke removed a subscriber: silke.Via WebApr 13 2015, 6:41 AM

Add Comment