Page MenuHomePhabricator

Investigate: Import the file, but suppress old revisions when they are blocked by the AbuseFilter
Closed, ResolvedPublic5 Estimated Story Points

Description

Motivation
When old revisions contain contents that triggers the AbuseFilter, we do not want to have that info on the target wiki. However, it would also be a pity if one could not move the file at all, because the current version fits all rules. Ideally, we find a way to not show blocked old content, but keep the rest of the file intact.

Acceptance Criteria

  • Investigate if there is a reasonable way to hide all info in older revisions that is not allowed per AbuseFilter warning

Notes

  • We are running AbuseFilter checks for every single revisions
  • Right now we don’t allow import of files with suppressed revisions when it comes to suppressed files. All other suppressed things are just substituted

Open questions

  • Can we make a suppressed edit during import, so that it really never appears on the source wiki? Or would there be a race condition?
  • If the suppression doesn't work, could one overwrite the info to read sth like “Due to an AbuseFilter rule this content is suppressed” (and then suppress it?)
  • Would it be possible to only suppress the part where the AbuseFilter fires? (Very first investigation said no, we would only know this revision is blocked

Event Timeline

To the first two questions:

Can we make a suppressed edit during import, so that it really never appears on the source wiki? Or would there be a race condition?

Currently it's not possible to import a revisions, that have suppressed parts right away. We would either:

  • have to deal with the race condition
  • or have to add that functionality to core

To the latter: We're currently utilizing the ImportableOldRevisionImporter[1] and ImportableOldRevision[2] to handle our ( text revision ) imports. Both are currently not holding any information about the suppression status of the revision to import. This could be changed though by adding information about the rev_delete[3]. The uncertainty about that would be if that's something that is wanted / allowed. The implementation itself seems to be not too complicated.

I have one more thought to the last question. In either case we would create suppressed revisions on the target wiki, that are triggered by the AbuseFilter on the one hand, but on the other hand are "executed" by a user, that not necessarily has the rights to do something like that.

If the suppression doesn't work, could one overwrite the info to read sth like “Due to an AbuseFilter rule this content is suppressed” (and then suppress it?)

Short answer: Yes. In that case there will be at least no point in time where "abusive" content is present on the target wiki. - We just have to make sure to identify and suppress the right revision when the import is finished.

[1] https://gerrit.wikimedia.org/g/mediawiki/core/+/bd1d2b8529501d00080c534e245479747a097604/includes/import/ImportableOldRevisionImporter.php
[2] https://gerrit.wikimedia.org/g/mediawiki/core/+/bd1d2b8529501d00080c534e245479747a097604/includes/import/ImportableOldRevision.php
[3] https://www.mediawiki.org/wiki/Manual:Revision_table#rev_deleted

Would it be possible to only suppress the part where the AbuseFilter fires?

From the AbuseFilter we only know that the revision is blocked and get the title and the id of the filter that was hit. We do not know that part of the revision triggered the filter.

Lea_WMDE moved this task from Demo to Done on the WMDE-QWERTY-Sprint-2019-06-12 board.

Due to

  • the code changes not being trivial and probably in mediawiki core
  • there still being some uncertainty how this could exactly be achieved
  • the question whether we really want to suppress things, where nobody could ever see what was suppressed for which reasons and from whom

the decision is to stick to current behavior: If the AbuseFilter is blocking a move of any of the revisions, we are not moving the file. (However, the question of whether there is a way to run certain checks only on the newest revision is still under investigation in T225521: Investigate whether the AbuseFilter can specify which revisions to execute a rule on