Page MenuHomePhabricator

Improve/Dispose AbuseFilter checks on old text revisions
Open, Needs TriagePublic

Description

Currently the FileImporter applies AbuseFilter checks for the contents of every text revision imported. - This was done to make sure, that abusive content - even if it is not part of the current revision - will not be imported to the target wiki.

From T206486 we got an example where an AbuseFilter -rule on Commons was triggered, because one of the revisions contained empty text and the rule makes sure, that empty text cannot be submitted. When "just" an old revision is affected by having no text this does not really seem to be an issue that should stop the import.

The example shows that in some cases it might not make sense to run the AbuseFilter checks for every "old" text revision. Some ideas how to deal with that:

  • dispose the checks for old revisions.
  • partly dispose the checks (e.g. only check the summary lines of old revisions but not the text)
  • find a way to tell the AbuseFilter that we're in the middle an import process and that we're dealing with "old" text revisions

Probably the last point is the best way to go here. Then the maintainers of the filter rules can decide for each rule what to do in our case. We're just not entirely sure yet if this is doable and how it should be done best.

Event Timeline

Jeff_G added a subscriber: Jeff_G.EditedFeb 15 2019, 1:54 PM

Can this tool please be configured to not import under 10 bytes file description page revisions, or at least to pad the ones under 10 bytes with spaces or the 10 byte string "Padded4AF4"? Alternatively, can it be configured to back off importation of revisions which fail any AF and import the rest (unless the latest revision fails)?

Pikne added a subscriber: Pikne.Apr 26 2019, 2:43 PM

Remarks on my encounter with another abuse filter: filter 69 is set to first warn and then allow/tag the edit. In FileImporter however it doesn't allow the change. If I click "Import" repeatedly then I'm still stuck with the warning. Unlike the current situation, I'd expect that if OTRS template is not changed during the import then the abuse filter is not triggered at all.

I was able to import this file by removing the OTRS template during import (had to re-add it after import). To reproduce this you can use some other file with OTRS template, like this one.

I also observe that during import this filter tags my edit for adding OTRS template, though instead I removed it. It would make more sense if older versions that actually include the OTRS template were tagged instead. Furthermore it would make even more sense if the checks would note that the user who initially added the OTRS template actually is an OTRS member (in global group), the way it works for new revisions outside FileImporter. So the situation seems quite messy due to checks for old revisions.

Perhaps "abusive content" in older revisions isn't much of a problem and, in case this is an easy solution, checks for older revisions can be simply dropped after all?