Add more variables to AbuseFilter file uploading evaluation
OpenPublic

Description

as title.


Version: unspecified
Severity: enhancement
See Also:
https://bugzilla.wikimedia.org/show_bug.cgi?id=14417

bzimport added a subscriber: Unknown Object (MLST).
bzimport set Reference to bz19565.
liangent created this task.Via LegacyJul 7 2009, 2:37 PM
werdna added a comment.Via ConduitJul 16 2009, 5:04 PM

(Batch change)

These are low-priority miniprojects that I can mop up at some point when I'm doing general code work as opposed to working on a specific projects.

werdna added a comment.Via ConduitAug 14 2009, 12:37 PM

[Batch change]

Removing Dave McCabe from CC, who I somehow managed to add to the CC list of 12 bugs assigned to me.

Rillke added a comment.Via ConduitNov 9 2011, 1:23 PM

Some checks are possible:

  • Title, user, action and SHA1 of file

Reference:
http://svn.wikimedia.org/viewvc/mediawiki/trunk/extensions/AbuseFilter/AbuseFilter.hooks.php?view=markup#l280

Especially for Commons it would be desirable if we could check the *content* (wikitext) of the new file for "source= google,..." and other invalid sources to automatically tag them.

Thank you.

hoo added a comment.Via ConduitFeb 28 2013, 1:40 AM

(In reply to comment #3)

Especially for Commons it would be desirable if we could check the *content*
(wikitext) of the new file for "source= google,..." and other invalid sources
to automatically tag them.

Thank you.

I tried to lay the foundation for this in https://gerrit.wikimedia.org/r/51326 (not yet merged) where I changed the extension to use a hook which provides more data than just the file names... but this still requires some changes to MediaWiki itself and then a follow up to the linked patch.

I'll keep you updated.

liangent added a comment.Via ConduitOct 8 2013, 8:33 AM

Just found that AbuseFilter already checks file uploads, but there aren't many variables available.

Currently those variables include:

Edit count of user (user_editcount)
Name of user account (user_name)
Groups (including implicit) user is in (user_groups)
Whether or not a user is editing through the mobile interface (user_mobile)
file_articleid = page id of the image page, or "0" for new uploads
file_namespace = "6"
file_text = unprefixed page name of the image page
file_prefixedtext = full page name of the image page
Action (action) = "upload"
SHA1 hash of file contents (file_sha1)
Whether or not the change was made through a tor exit node (tor_exit_node)
Unix timestamp of change (timestamp)

Missing:

upload comment
wikitext of the image page (for new uploads)

Mahitgar added a comment.Via ConduitFeb 27 2014, 12:54 PM

Any updates about progress on this bug Please . We prefere this getting fixed. Rgds

Aklapper added a comment.Via ConduitFeb 27 2014, 1:05 PM

This report has low priority, but if you have a high interest in getting a specific problem fixed, you could provide a patch to speed up the process.
See https://www.mediawiki.org/wiki/How_to_become_a_MediaWiki_hacker for more information.

hoo added a comment.Via ConduitFeb 27 2014, 1:14 PM

This is still blocked by changes to how MediaWiki handles uploads (and especially the page creations linked to them)... we need more context from that side (right now AbuseFilter has no way to know about the wikitext used as description).

FDMS added a comment.Via ConduitSep 8 2014, 8:28 PM

This bug enables bad-faith editors to upload media files with an invalid {{permissionOTRS}} tag without anyone noticing. Therefore, in my opinion, this is not a low priority enhancement.

Aklapper added a comment.Via ConduitSep 8 2014, 10:56 PM

[As long as nobody works on this ticket it de-facto will be low priority, no matter what the Priority field will be set to.]

werdna removed a subscriber: werdna.Via WebDec 10 2014, 5:24 PM
Legoktm added a subscriber: Legoktm.Via WebJan 8 2015, 5:29 AM

https://gerrit.wikimedia.org/r/183430 added a file_size variable.

Steinsplitter added subscribers: Rjd0060, pajz, Krd.Via WebJan 8 2015, 6:08 PM
Steinsplitter set Security to None.
DFoy added a subscriber: DFoy.Via WebJan 8 2015, 6:57 PM
Fae added a subscriber: Fae.Via WebFeb 11 2015, 3:59 PM
Fae added a comment.Via WebFeb 11 2015, 4:03 PM

This is a great loophole for copyright abuse and uploading professional photographs to Commons without permission. Once an OTRS ticket gets "recycled", an image would be highly unlikely to get noticed as a copyright problem or that the ticket was not added by anyone with OTRS access.

If this cannot be fixed, we really should consider if an efficient bot can be designed to do the same job.

Se4598 added a subscriber: Se4598.Via WebFeb 11 2015, 9:36 PM
In T21565#1031373, @Fae wrote:

This is a great loophole for copyright abuse and uploading professional photographs to Commons without permission. Once an OTRS ticket gets "recycled", an image would be highly unlikely to get noticed as a copyright problem or that the ticket was not added by anyone with OTRS access.

If this cannot be fixed, we really should consider if an efficient bot can be designed to do the same job.

What exactly is your point? What variables are missing for OTRS-checking?
If AF doesn't pick up a upload it should, then it's a bug, but outside of this tasks scope.

ok @Fae, I think I understand and you mean T89252.

Fae added a comment.Via WebFeb 11 2015, 11:23 PM

ok @Fae, I think I understand and you mean T89252.

Yep, probably. It's not a "bad" OTRS ticket though, this is just finding a way to flag when a non-OTRS user adds what might be a perfectly good ticket. In fact sometimes we want batch uploaders to do things this way, so it's not something to stop, just something to track.

Dragons_flight added a subscriber: Dragons_flight.Via WebApr 6 2015, 7:25 PM

The underlying technical problem is that uploads proceed in two steps. First the file is uploaded to the server and then the text is processed and saved to the database. We have a hook into the upload process, but at that point in software the uploaded text is not yet available.

After poking around the code a bit, my instinct is that the best approach probably to add a new hook in UploadBase/performUpload prior to calling LocalFile->upload. Call it something like, UploadBeforeSaveText and send it the file info as well as the $comment, $page_text, and $user. And then move all AbuseFilter processing of uploads to that hook. The biggest potential pitfall is probably making sure the code aborts correctly and doesn't leave dangling stuff on the file system (the code is complicated enough that I'm not currently sure what cleaning might be necessary in response to a late abort).

The only other sensible alternative that I see would be to make page_text, etc. properties of UploadBase and associate it with the file much earlier in the upload process, before verifyUpload gets called. In some ways this might be preferable conceptually, as it would enforce a tighter integration of file description text and file content (and doesn't require a new hook), but doing that seems like it would require many more dependency changes since one would be modifying the parameters sent to performUpload among other things.

Does anyone else have suggestions about how to proceed here?

Se4598 added a comment.Via WebApr 6 2015, 7:34 PM

The underlying technical problem is that uploads proceed in two steps. First the file is uploaded to the server and then the text is processed and saved to the database. We have a hook into the upload process, but at that point in software the uploaded text is not yet available.

After poking around the code a bit, my instinct is that the best approach probably to add a new hook in UploadBase/performUpload prior to calling LocalFile->upload. Call it something like, UploadBeforeSaveText and send it the file info as well as the $comment, $page_text, and $user. And then move all AbuseFilter processing of uploads to that hook. The biggest potential pitfall is probably making sure the code aborts correctly and doesn't leave dangling stuff on the file system (the code is complicated enough that I'm not currently sure what cleaning might be necessary in response to a late abort).

The only other sensible alternative that I see would be to make page_text, etc. properties of UploadBase and associate it with the file much earlier in the upload process, before verifyUpload gets called. In some ways this might be preferable conceptually, as it would enforce a tighter integration of file description text and file content (and doesn't require a new hook), but doing that seems like it would require many more dependency changes since one would be modifying the parameters sent to performUpload among other things.

Does anyone else have suggestions about how to proceed here?

Please see the referenced tasks. See T89252 and especially T89302: Create check hook before really uploading file with infos about description page and file, which is basically what you suggest. Poke and get feedback/attention there (maybe from MediaWiki-Core-Team).

Add Comment