Due to sustained Wikipedia Zero abuse T129845 we should disallow WP0 users from receiving files until they can be reviewed for copyright.
----
== Implementation plan
=== Core functionality:
* Make `File::getContentHeaders()` look up patrol status of the page (probably with some kind of hook so that FlaggedRevs wikis with patrolling disabled can plug in review status instead) and add an `X-MediaWiki-Patrol-Status` header accordingly.
** It doesn't seem like `File::getContentHeaders()` is acutally used anywhere; `LocalFile::upload()` uses `MediaHandler::getContentHeaders()` instead (and we don't want to depend on that as this should work for files which have no handler). That's probably a bug and should be fixed.
** On copy/move, seems like `SwiftFileBackend::doXXXInternal()` takes care of persisting the headers.
** It seems like all header options go through `SwiftFileBackend::sanitizeHdrs()` which ends up removing almost everything. So that should be fixed to keep `X-MediaWiki-` headers.
* Make Varnish split the cache on zero status and return an error response with short-lived cache if the client is zero-rated and the backend response has `X-MediaWiki-Patrol-Status: unpatrolled`.
* In `RecentChange::doMarkPatrolled()` (eww, but seems the least bad location) remove the Swift header with something like `FileBacken::doQuickOperations( [ [ 'op' => describe', 'src' => $file->getPath(), 'headers' => [ 'X-MediaWiki-Patrol-Status' => '' ] ] ] )` (while making sure not to re-trigger T178849) and also call `WikiFilePage::doPurge()`.
=== Transcodes:
Most of the piracy is video-related and using transcodes would be a trivial workaround so TimedMediaHandler needs to copy the header to those (and hook into the purging of the header to remove them).
=== FlaggedRevs wikis:
(Some) FlaggedRevs wikis use a separate review mechanism for images alongside or instead of patrolling. (At least huwiki has review for files enabled, patrolling disabled, and is an occasional WP0 piracy target; so while less important, this hole should be plugged too eventually.) To support them, a new hook should be added to `File::getContentHeaders()` so they can set the patrolled flag; and the purge mechanism should be made available so that FlaggedRevs wikis can call it when a file is reviewed.
=== Thumbnails:
This is less important as images are mostly used by the pirates as containers for hidden files, and it's unlikely those would survive thumbnailing. (I.e. we probably don't need to bother.)
* On thumbnailing, `File::generateAndSaveThumb()` calls `FileRepo::quickImport()` and does not pass any headers except `Content-Disposition`, and as far as I can see there is no other magic to copy them over from the original, so we'd need to re-inject the headers there. Probably the easiest way is to fetch the headers of the originals from Swift and copy them (avoids special-casing of foreign DB images etc).
* Also update Thumbor accordingly.
* Maybe also make `thumb.php` add the header? Not sure if that's actually useful - for sites using Swift, like Wikimedia, the thumbnail will always be streamed from Swift I think, so it's enough to have them there. For sites using file storage, exposed via the web server, there is no way to set custom headers. For the default MediaWiki config (everything goes through thumb.php), this would make the headers more consistent, but it's unlikely that anyone using such a low-performance config would need to filter on patrol status. Same goes for `img_auth.php`, it's unlikely that private wikis would ever need this feature.
----
Patches are under the [[https://gerrit.wikimedia.org/r/#/q/topic:wp0|wp0]] topic. Testing plan:
* enable `swift` and `varnish` roles in vagrant
* install all the patches; instead of the mediawiki-config patch use something like
```
foreach ( $wgFileBackends as &$backend ) {
if ( $backend['name'] === 'swift-backend' ) {
$backend['headerWhitelist'] = [ '/^x-mediawiki-/' ];
}
}
```
* the testing patch for vagrant uses the browser language as a proxy for Zero status so add Arabic to the list of accepted languages to simulate coming from a Zero domain.