Page MenuHomePhabricator

Investigation into Abuse Filter errors thrown up during file import
Closed, ResolvedPublic3 Estimated Story Points

Description

  • Look at grafana board for occuring abuse filter errors on import
  • Create overview of what these errors mean and how often they appear so we can discuss them

Event Timeline

Lena_WMDE set the point value for this task to 3.

Overview over all errors we log ( defaults to the last 30days ):

https://grafana.wikimedia.org/d/000000553/mediawiki-fileimporter?panelId=25&fullscreen&orgId=1

List of errors over the last 365 days ( with >23 K successful uploads ):

file_missing_required_template2.2 K
userPermissionsError1.9 K
duplicateFiles901
cantimportfilehidden356
file_contains_blocked_category_template300
commonshelper_missing_config172
abusefilter_warning_otrs109
operationCommit88
40481
userBlocked72
cantimportfromsharedrepo60
filetype_mime_mismatch38
commonshelper_parsing_failed37
abusefilter_warning_blanking234
api_badinfo33
abusefilter_warning_copyv230
cantimporturl23
spam_blacklisted_link22
userGloballyBlocked21
abusefilter_disallowed21
uploaded_href_unsafe_target_svg18
1_218
filemissinginrevision12
012
uploaded_href_attribute_svg11
abusefilter_warning_review11
upload_scripted_dtd7
filenameerror_notallowed7
4006
revisionMissingField5
filetoolarge5
api_toomanyrevisions5
abusefilter_warning_use_delete_gadget5
tiff_bad_file3
cantparseurl3
cantimportmissingfile3
api_failedtogetinfo3
abusefilter_warning_mp33
uploadinvalidxml2
api_nopagesreturned2
noSourceApiFound1
noNullRevisionCreated1
filenameerror_noplannedextension1

Detailed look into the AbuseFilter rules.

abusefilter_warning_otrs109Filter 69 - Adding OTRS permission by non-OTRS member - This rule checks if OTRS tempaltes are only added by users allowed.Should just give a warning and add a tag.Makes no sense on older revisions but in the current. See also T213409
abusefilter_warning_blanking234Filter 4 - Page blanking - If there's for some reason an emptied page the import is completely blocked.Blocks the import completly.Makes no sense on older revisions.
abusefilter_warning_copyv230Filter 154 - Possible copyvio - Possible copyright violations. Checks for specific triggerwords in the wikitext ( e.g. getty, shutterstock ).Should just give a warning.Makes no sense on older revisions
abusefilter_disallowed21Unspecified rule. - One or more rules that we currently can/do not distinguish in the logging.
abusefilter_warning_review11Filter 70 - License review by non-Image-reviewers - I don't have permissions to see that filter in detail but I'm quite confident that that's the filter for that error. I guess this is about the "usage" of a template that not all users are allowed to use. So if you import a file with that template the rule is triggered. - I guess it would be save to have this disabled for old revisions.
abusefilter_warning_use_delete_gadget5Filter 71 - Recommend the "Nominate for Deletion" gadget - Shows a hint to the user in cases where the file description seems to contain text where you could assume the user wants to suggest a deletion.Should just give a warning and add a tag.Makes no sense on older revisions.
abusefilter_warning_mp33Filter 192 - Restrict MP3 uploads Uploading MP3 files is generally forbidden.Blocks the import completely.Makes sense on older revisions.

The rules that should just give a warning currently not working as intended. It seems that the FileImporter blocks these imports completely. The expected behavior would be to show a warning and then allow clicking the submit button again and continue with the import. - If I remember correctly, this worked at some point. I think with the introduction of the separate error page - that does not allow submitting again - this workflow is broken.

In some of these cases additionally to the warning there should be added a tag. This is also not done by the FileImporter. I think this never worked.

abusefilter_warning_copyv2 […] Makes no sense on older revisions

Oh, take care. A copyright violation in an older revision is still a copyright violation. It's accessible for everybody. Wikimedia hosts it. Wikimedia can be sued. Or worse: The original uploader can be sued.

abusefilter_warning_copyv2 […] Makes no sense on older revisions

Oh, take care. A copyright violation in an older revision is still a copyright violation. It's accessible for everybody. Wikimedia hosts it. Wikimedia can be sued. Or worse: The original uploader can be sued.

Normally old versions with copyright violations are suppressed. Then there should be no need to handle them specially/different than other cases of files with suppressed versions.

abusefilter_warning_copyv2 […] Makes no sense on older revisions

Oh, take care. A copyright violation in an older revision is still a copyright violation. It's accessible for everybody. Wikimedia hosts it. Wikimedia can be sued. Or worse: The original uploader can be sued.

Normally old versions with copyright violations are suppressed. Then there should be no need to handle them specially/different than other cases of files with suppressed versions.

Maybe one word to that. The rule in question is just a very rough attempt to detect possible violations early. It just gives the user a warning and still does allow the import in a second step. Also it should be considered that the source file in question would need to have a template added to it, that suggests having a compatible license before the user even gets to the import that would trigger the warning.

So even if we would ignore the warning on old revisions ( what we will not do ), there are a few steps of people doing things wrongly before the FileImporter would be a tool that was a piece in getting anyone sued. :-)

Lena_WMDE moved this task from Demo to Done on the WMDE-QWERTY-Sprint-2020-05-27 board.