Page MenuHomePhabricator

AbuseFilter is near-impossible to test on uploads
Closed, ResolvedPublic

Description

AbuseFilter has various tools (such as "Examine past edits" and "Batch testing") that are supposed to help with the writing and testing of filters. These tools do not work at all with uploads, though. A simple filter consisting of action=="upload" does not trigger on anything in these test modes, whether you try to find uploads by username, file name or recent changes time interval. (In actual operation, such a filter triggers on uploads just fine.) AbuseFilter in testing mode only seems to be aware of the associated edit event that creates the file page, not the upload event itself.

Steps to reproduce: visit https://commons.wikimedia.org/wiki/Special:AbuseFilter/test (you probably need admin-ish rights), put `action=="upload" in the big textarea, check "Show changes that do not match the filter", click "Test".
Expected result: lots of hits (green checkmarks) - the test page shows the last 100 changes and this is Commons so there should be lots of uploads amongst them.
Actual result: all changes are misses. You see RC items that start with "(Upload log)" but when you click "examine" these turn out to be edit actions.

A workaround is to create a filter with just action=="upload" and no actions. That will record every upload into the abuse log, you can use the "examine" option to test them with an arbitrary filter.

Event Timeline

Known this for a long time, never filed a bug. Shame on me

Well, isn't it the case that AbuseFilter also cannot test delete, move, or block actions? These work fine in a live filter, but you cannot test them in the batch testing. In fact, I think no action other than edit can be tested in batch testing. If I am right, then we should expand this task and/or add subtasks for those actions too.

Well, isn't it the case that AbuseFilter also cannot test delete, move, or block actions?

  • move - filterable and testable (query, vars) but when I click "examine" there's no action = 'move'
  • delete - not testable but filterable
  • block - AF has never checked blocks

For upload, it's necessary to:

  1. adjust the query
  2. create AbuseFilter::getUploadVarsFromRCRow method
  3. note: there's AbuseFilterHooks::filterUpload where upload vars are computed

If I am right, then we should expand this task and/or add subtasks for those actions too.

Yes, please.

Since you mentioned that uploads are "edits" rather than uploads, you can target file uploads using the following

article_namespace == 6 & (action=="upload" | action=="edit")

This was tested under mediawiki 1.28.2, whether it works for 1.29 or 1.30 is unknown.

Since you mentioned that uploads are "edits" rather than uploads, you can target file uploads using the following

This task is about testing in retrospect, not in prospect.

Since you mentioned that uploads are "edits" rather than uploads, you can target file uploads using the following

This task is about testing in retrospect, not in prospect.

perhaps I should have clarified, the above works for both live and batch testing of regex filters. Uploads currently are set as edit actions (according to examine), which means the upload action is essentially useless.

No, there is a separate upload event (with file-related data) and edit event (with wikitext-related data) for every upload. This bug is about the upload event being impossible to examine.

MarcoAurelio subscribed.

Lack of support of this feature is causing troubles with regards to recent forms of abuse. Could we please prioritize this task a bit higher? Maybe Community-Tech or Anti-Harassment could be involved? Thanks.

I know the priority field does reflect the status of the task and not the importance of the issue, but this should be a high priority to fix IMHO. Thanks.

I looked at this. Upload is pretty different from other actions, since it has lot of specific file-related variables to be computed. While filtering the upload we have an UploadBase object from the hook to retrieve such info, however this is not the case for a RC entry. So my question is: do we have a method for retrieving file data from a RC entry? AFAICT, such data is stored in both image and oldimage tables, but they don't seem to be uniquely reachable from RC, if not with timestamp and title. A solution could be, given an RC upload row, to query both image and oldimage tables looking for a match on img_title/oi_name with rc_title and a similar one for timestamp, but it seems poorly efficient to me. Is there a better way?

Change 445325 had a related patch set uploaded (by Daimona Eaytoy; owner: Daimona Eaytoy):
[mediawiki/extensions/AbuseFilter@master] [WIP] Make uploads testable

https://gerrit.wikimedia.org/r/445325

Change 445325 merged by jenkins-bot:
[mediawiki/extensions/AbuseFilter@master] Make uploads testable

https://gerrit.wikimedia.org/r/445325

Daimona removed a project: Patch-For-Review.

Hoping that this won't create more problem than it solves!