Commons and testwiki used as video hoster by Wikipedia Zero
Open, NormalPublic

Description

Wikipedia Zero users are thinking that http://test.wikipedia.org/ and commons is some sort of YouTube. They uploaded thousands of out-of-scope/copyright violation files to wiki.
Now blocked by abuse filter: https://test.wikipedia.org/wiki/Special:AbuseFilter/160 on testwiki.
A example on commons can be found here

I also had a talk with Alex Z on irc, citing part of here:

14:39:32 <AlexZ> Steinsplitter: I've been getting individual emails from the hundreds of users I've blocked there. They all seem to be under the impression that's that's what test wiki is for... to be a "free YouTube" for Wikipedia Zero.
(...)
4:47:33 <AlexZ> I even saw someone try to upload a how-to video on uploading movies to test...
14:50:38 <AlexZ> https://test.wikipedia.org/wiki/Special:AbuseLog/32063

For commons see also:
https://commons.wikimedia.org/wiki/User:Teles/Angola_Facebook_Case
https://commons.wikimedia.org/wiki/User:NahidSultan/Bangladesh_Facebook_Case

See this as well:
https://pt.wikipedia.org/wiki/Wikip%C3%A9dia:Esplanada/geral/Operadoras_angolanas_disponibilizam_acesso_gratuito_%C3%A0_Wikipedia_(16mar2016)
https://meta.wikimedia.org/w/index.php?title=Wikimedia_Forum&oldid=12835750#Wikipedia_Zero_being_used_to_violate_copyright

Related Objects

There are a very large number of changes, so older changes are hidden. Show Older Changes
Denniss added a subscriber: Denniss.Apr 2 2016, 5:07 PM

Please delete this from the cache. Thanks.

I tried overwriting it but for some weird reason (worth investigating) my file was urlencoded. Awful song btw.

Krenair added a subscriber: Krenair.Apr 2 2016, 6:49 PM

I don't think that this is relevant to everyone interested in mobile or the entirety of the Editing team. I'm narrowing down the tags to people who are likely to actually be involved with this. :)

Some comments from the merged task:

I think a first step could be creating a series of abusefilter variables:

  • some flag for edits made through zero
  • upload dimension (afair there isn't)
  • number of pages for documents uploads

Focusing specifically on WP0 uploaders doesn't seem to be the most effective approach here - there'd be nothing stopping a small number of non-WP0 users seeding this content onto Commons for anyone else to retrieve. (Likewise, there's no particular reason the downloaders have to be on WP0).

One way of identifying such files, is to convert the file to some other format and then back, and see the difference in compression ratio. For common methods of embedding files, things with embedded files will shrink considerably. @Dispenser used to have a bot that did this, I believe. (This approach won't work for people who are super sneaky though, and for example encoding stuff in the low-order bits of the image data, etc).

Not a bot, but a tool. There are some details in https://commons.wikimedia.org/wiki/User:Dispenser/Absurd_overhead - It uses some non-free software though, so cannot be run on Tool Labs (but let's not tangent into that issue, here!)

Are there any other ways to determine that the file extension doesn't match the file's contents? Headers or something?

I think a first step could be creating a series of abusefilter variables:

  • some flag for edits made through zero

T131211 (already in progress).

  • upload dimension (afair there isn't)
  • number of pages for documents uploads

Yeah, this could be useful to implement a basic "absurd overhead" check, filed T131643.

Although I'm not sure how helpful this will be – it looks like the big problem is copyrighted videos/movies, and not things being sneaked within image files.

Gunnex added a comment.EditedApr 4 2016, 8:56 AM

Here a post (01.04.2016) from Bangladesh public FB group "Wikimedia Free Download BD", giving instruction & help how to upload videos and minimizing the size of the uploads, etc. to Commons. https://www.facebook.com/groups/1683585148563391/permalink/1695000264088546/. Maybe useful for further analysis, considering also the feedback from FB user who are also posting screenshots from block screens etc... Btw, the uploads are continuing, see Bangladesh Facebook Case (part II)

Here a post (01.04.2016) from Bangladesh public FB group "Wikimedia Free Download BD", giving instruction & help how to upload videos and minimizing the size of the uploads, etc. to Commons. https://www.facebook.com/groups/1683585148563391/permalink/1695000264088546/. Maybe useful for further analysis, considering also the feedback from FB user who are also posting screenshots from block screens etc... Btw, the uploads are continuing, see Bangladesh Facebook Case (part II)

When https://gerrit.wikimedia.org/r/#/c/280468/ is live on servers i will setup a filter for commons.

The user_wpzero variable is now available for use on all wikis. It's a boolean option indicating whether the user is connecting over Wikipedia Zero, it doesn't distinguish between which carrier is used or anything.

Quiddity removed a subscriber: Quiddity.Apr 7 2016, 1:38 AM

Reposting here my comment from today in another, related task, which may be relevant also for this task.

In T131934#2198320, @Gunnex wrote on 12.04.2016:

So far I can analyze the situation (I am not a technician, just giving some feedback from the user front about what I am currently monitoring...), only a few uploads of copyrighted filmes/videos and music were triggered by abuse filter 149 (Wikipedia Zero uploads), assuming that most of them are using paid mobil (flat-rates), probadly also with better bandwidth to handle uploads up to 500 MB — 1 GB (complete films: Superman/Star Wars etc.). I remember the "Angola Fecebook Case" where especially some admins of related Facebook groups wasted mobile credits for "their" Facebook audience, providing music & video files via Commons. Btw, the "Angola Fecebook Case" was focused more on music (mostly ogg-files). The "Bangladesh Facebook Case" is more focused on films & videos (webm, ogv).

If you go into details in the video2commons filter you will see that:

  1. +/- 98-99 % of the red (deleted) links were uploaded via the "Bangladesh Facebook Case"...
  2. ...which - btw - is increasing in user numbers significantly: two weeks ago around 70 accounts were identified, on 09.04.2016 we had 148 accounts. Today we have over 190 accounts.
  3. most of them are using "video2commons" in combination with "googlevideo.com" as source (typical url: "https://r8---sn-4g57kn7e.googlevideo.com/videoplayback?i(...)". Info: they are NOT exclusively using "video2commons" as upload tool, uploading files also using standard tools... but often triggered also via the mobile edit filter.

    So, 1-3 leads to the question: would blocking "googlevideo.com" in a video upload help? Well, googlevideo.com redirects to the Google Video Search... Or: Is preventing/blocking mobile uploads of videos & music files an option?

    In other words: we need urgently a solution for this.

    It is nice to have some tools for monitoring but if you have no one who tags the files quickely, the tools will become useless. We are talking about several Bangladesh Facebook groups (who are sharing these files instantely) with +10.000 members. And we have Millions of members out there in related FB groups just looking for free file hosters like... Commons.

    Btw, not only for this specific case. So far what I have seen (this is my personal impression, but I am almost daily involved in cleaning these files), the whole Wikipedia Zero traffic on Commons (since now we can trigger it since 06.04.2016) is (especially the uploads) mostly useless, grabbed from Internet, out of scope, etc.. binding additional forces from an already understaffed team. Is Wikipedia Zero even justifiable with a bad ratio of (let's say) +/- 90—95 %?
Pokefan95 triaged this task as "High" priority.Apr 13 2016, 3:12 AM

Just saw a Facebook post. Now they are advising people to use www.hideme.be and create another account if they get blocked by admins.

Bodhisattwa edited the task description. (Show Details)Apr 14 2016, 3:18 PM

Just saw a Facebook post. Now they are advising people to use www.hideme.be and create another account if they get blocked by admins.

Most of are already globally blocked as open proxies. Though one was softened because of the Polish toolserver: here an example of how useful T42439 would be.

FYI...

In T129845#2200041, @Gunnex wrote on 12.04.2016:

(...)

  1. ...which - btw - is increasing in user numbers significantly: two weeks ago around 70 accounts were identified, on 09.04.2016 we had 148 accounts. Today we have over 190 accounts. (...)

241 accounts now...

In T129845#2207586, @Gunnex wrote on 14.04.2016:

FYI...

In T129845#2200041, @Gunnex wrote on 12.04.2016:

(...)

  1. ...which - btw - is increasing in user numbers significantly: two weeks ago around 70 accounts were identified, on 09.04.2016 we had 148 accounts. Today we have over 190 accounts. (...)

241 accounts now...

Getting close to 300 accounts now (currently 291)...

Well, from the user front again:

We have now a quarry working (thx @Dispenser for attending my request) which shows the whole mess: Newbie Deleted Audio/Video (16.04.2016). So far I can see, the "Bangladesh Facebook Case" started +/- on 01.03.2016. On 03.03.2016 we have with "File:Because Of The Night (Download Group BD).ogg" the 1st Facebook group mentioned. The mess on testwiki started — per upload log — more or less in the same period (btw, testwiki since yesterday again hard blocked, because they continued their uploads, not triggered by filter 160).

Thx to @NahidSultan and some others, we are still (more or less) able to detect the uploads quickly, which is important: not only for preventing the shares on Facebook but also for not attracting free raiders to do the same --> trying to maintain an opinion that it is not worth to upload files to Commons: "they will delete the file within a short time".

Nevertheless, all this appears reaching (again) a permanent state, and talking from myself --> I am getting tired and growing impatient: first the "Angola Facebook Case" (where I was massively involved + special thx to @Teles) and now this one — both cases taking me down from other maintance tasks and projects. Furthermore, I'll go into several (short) vacation trips still in April and May, not being able to help in monitoring this case.

Currently, we are only reacting to the uploads. We have filters etc.. But all passive. And all based on luck/coincidence that someone has sufficient time to deal with it — but Commons is permanently understaffed....

We maybe should consider an active strategy, contacting the admins and uploader from the related Facebook groups directly, trying a preventive and enlightening dialogue with them and to point out that — as last consequense if this continues — Wikpedia Zero in Bangladesh (and other countries) will shot down. Btw, see also the press coverage over the "Angola Facebook Case" here. It's only a matter of days or weeks and we will have certainly the same for "Bangladesh"...

The potential danger are the free raiders. We may (as suggested above) contact identified FB groups admins & uploaders and open a direct dialogue — but the free raiders (who may be also members of the related FB groups) are hardly to identify. Btw, Angola = 5,951,453 internet users --> Bangladesh = 21,439,070 internet users + better (mobile) IT infrastructure and education level. And for Commons it was more easier to block Angolan uploads because most of the traffic was performed via local Wikipedia Zero mobile carriers and focused mostly on small .ogg (only music) files. For Bangladesh, that's different: they are uploading complete filmes ("Batman v Superman: Dawn of Justice", "Star Wars...")/videos/clips), often involving hundreds of MB, mostly performed via multiple, paid (mobile) carriers (Commons WO filter 149 did only catch a few).

So.... well... just some news from the user front...

In T129845#2207586, @Gunnex wrote on 14.04.2016:

FYI...

In T129845#2200041, @Gunnex wrote on 12.04.2016:

(...)

  1. ...which - btw - is increasing in user numbers significantly: two weeks ago around 70 accounts were identified, on 09.04.2016 we had 148 accounts. Today we have over 190 accounts. (...)

241 accounts now...

Getting close to 300 accounts now (currently 291)...

Well, from the user front again:

We have now a quarry working (thx @Dispenser for attending my request) which shows the whole mess: Newbie Deleted Audio/Video (16.04.2016). So far I can see, the "Bangladesh Facebook Case" started +/- on 01.03.2016. On 03.03.2016 we have with "File:Because Of The Night (Download Group BD).ogg" the 1st Facebook group mentioned. The mess on testwiki started — per upload log — more or less in the same period (btw, testwiki since yesterday again hard blocked, because they continued their uploads, not triggered by filter 160).

Thx to @NahidSultan and some others, we are still (more or less) able to detect the uploads quickly, which is important: not only for preventing the shares on Facebook but also for not attracting free raiders to do the same --> trying to maintain an opinion that it is not worth to upload files to Commons: "they will delete the file within a short time".

Nevertheless, all this appears reaching (again) a permanent state, and talking from myself --> I am getting tired and growing impatient: first the "Angola Facebook Case" (where I was massively involved + special thx to @Teles) and now this one — both cases taking me down from other maintance tasks and projects. Furthermore, I'll go into several (short) vacation trips still in April and May, not being able to help in monitoring this case.

Currently, we are only reacting to the uploads. We have filters etc.. But all passive. And all based on luck/coincidence that someone has sufficient time to deal with it — but Commons is permanently understaffed....

We maybe should consider an active strategy, contacting the admins and uploader from the related Facebook groups directly, trying a preventive and enlightening dialogue with them and to point out that — as last consequense if this continues — Wikpedia Zero in Bangladesh (and other countries) will shot down. Btw, see also the press coverage over the "Angola Facebook Case" here. It's only a matter of days or weeks and we will have certainly the same for "Bangladesh"...

The potential danger are the free raiders. We may (as suggested above) contact identified FB groups admins & uploaders and open a direct dialogue — but the free raiders (who may be also members of the related FB groups) are hardly to identify. Btw, Angola = 5,951,453 internet users --> Bangladesh = 21,439,070 internet users + better (mobile) IT infrastructure and education level. And for Commons it was more easier to block Angolan uploads because most of the traffic was performed via local Wikipedia Zero mobile carriers and focused mostly on small .ogg (only music) files. For Bangladesh, that's different: they are uploading complete filmes ("Batman v Superman: Dawn of Justice", "Star Wars...")/videos/clips), often involving hundreds of MB, mostly performed via multiple, paid (mobile) carriers (Commons WO filter 149 did only catch a few).

So.... well... just some news from the user front...

I cannot give much help on the Commons because of policies but I can start testing some filter at testwiki.

Tbayer added a subscriber: Tbayer.Apr 16 2016, 9:18 PM
In T129845#2207586, @Gunnex wrote on 14.04.2016:

FYI...

In T129845#2200041, @Gunnex wrote on 12.04.2016:

(...)

  1. ...which - btw - is increasing in user numbers significantly: two weeks ago around 70 accounts were identified, on 09.04.2016 we had 148 accounts. Today we have over 190 accounts. (...)

241 accounts now...

Getting close to 300 accounts now (currently 291)...

Well, from the user front again:

We have now a quarry working (thx @Dispenser for attending my request) which shows the whole mess: Newbie Deleted Audio/Video (16.04.2016). So far I can see, the "Bangladesh Facebook Case" started +/- on 01.03.2016. On 03.03.2016 we have with "File:Because Of The Night (Download Group BD).ogg" the 1st Facebook group mentioned. The mess on testwiki started — per upload log — more or less in the same period (btw, testwiki since yesterday again hard blocked, because they continued their uploads, not triggered by filter 160).

Thx to @NahidSultan and some others, we are still (more or less) able to detect the uploads quickly, which is important: not only for preventing the shares on Facebook but also for not attracting free raiders to do the same --> trying to maintain an opinion that it is not worth to upload files to Commons: "they will delete the file within a short time".

Nevertheless, all this appears reaching (again) a permanent state, and talking from myself --> I am getting tired and growing impatient: first the "Angola Facebook Case" (where I was massively involved + special thx to @Teles) and now this one — both cases taking me down from other maintance tasks and projects. Furthermore, I'll go into several (short) vacation trips still in April and May, not being able to help in monitoring this case.

Currently, we are only reacting to the uploads. We have filters etc.. But all passive. And all based on luck/coincidence that someone has sufficient time to deal with it — but Commons is permanently understaffed....

We maybe should consider an active strategy, contacting the admins and uploader from the related Facebook groups directly, trying a preventive and enlightening dialogue with them and to point out that — as last consequense if this continues — Wikpedia Zero in Bangladesh (and other countries) will shot down. Btw, see also the press coverage over the "Angola Facebook Case" here. It's only a matter of days or weeks and we will have certainly the same for "Bangladesh"...

The potential danger are the free raiders. We may (as suggested above) contact identified FB groups admins & uploaders and open a direct dialogue — but the free raiders (who may be also members of the related FB groups) are hardly to identify. Btw, Angola = 5,951,453 internet users --> Bangladesh = 21,439,070 internet users + better (mobile) IT infrastructure and education level. And for Commons it was more easier to block Angolan uploads because most of the traffic was performed via local Wikipedia Zero mobile carriers and focused mostly on small .ogg (only music) files. For Bangladesh, that's different: they are uploading complete filmes ("Batman v Superman: Dawn of Justice", "Star Wars...")/videos/clips), often involving hundreds of MB, mostly performed via multiple, paid (mobile) carriers (Commons WO filter 149 did only catch a few).

So.... well... just some news from the user front...

Wikimedia Bangladesh is aware of this situation and we're trying our best to prevent any further mess. We're trying to create awareness through social medias by raising this issue and contacting individual group/page admins requesting them not to promote copyrighted materials on commons. Among them two groups have already announced (1, 2) that they will not continue further copyrighted uploads in a response to this Facebook post that was posted few hours ago from WMBD's official Facebook page. Lets just hope that they will stick to their words.

For the record, there is T133010: Please upload large file to Wikimedia Commons where a user (after they get blocked) registers an ldap account and then request uploading on phab.

! In T129845#2218072, @NahidSultan wrote:

Wikimedia Bangladesh is aware of this situation and we're trying our best to prevent any further mess. We're trying to create awareness through social medias by raising this issue and contacting individual group/page admins requesting them not to promote copyrighted materials on commons. Among them two groups have already announced (1, 2) that they will not continue further copyrighted uploads in a response to this Facebook post that was posted few hours ago from WMBD's official Facebook page. Lets just hope that they will stick to their words.

Good news, I put your text into translator and it seems to be properly targeted! They can actually share media via commons, we can somehow be their youtube but well, just for PD stuffs.

Wikimedia Bangladesh is aware of this situation and we're trying our best to prevent any further mess. We're trying to create awareness through social medias by raising this issue and contacting individual group/page admins requesting them not to promote copyrighted materials on commons. Among them two groups have already announced (1, 2) that they will not continue further copyrighted uploads in a response to this Facebook post that was posted few hours ago from WMBD's official Facebook page. Lets just hope that they will stick to their words.

That's great news Nahid! good work.

@Yurik Assuming you're the one that created data.wmflabs.org, can you please disable uploads on that wiki, before Wikipedia Zero know that we have a wiki that is vulnerable to copyright violations? Thanks.

Yurik added a comment.Apr 20 2016, 8:30 PM

@Pokefan95, done, but I suspect that there are tons of various wmflabs vagrant instances that allow file uploads. Also, I am a bit confused why wmflabs is in the same ip range as production - I think we should separate the two.

jayvdb added a comment.EditedApr 20 2016, 11:42 PM

According to T131934, tool labs is a totally different ip range, and only production is zero rated.

NahidSultan added a comment.EditedApr 24 2016, 9:32 AM

Update: It seems that Wikimedia Bangladesh's awareness is working (though it's a bit early to say). After discussing with different Facebook pages and group admins individually, most of them have agreed to stop uploading copyrighted videos. We're still receiving those uploads but mostly from individual users, not from a group. To be precise, recent uploads are coming from only one/two Facebook user(s) through various usernames (sometimes in other languages) based on the evidence from Facebook groups.

DFoy added a comment.Apr 24 2016, 9:53 AM

@NahidSultan - great to hear that your approach is having positive results! I will be in touch with the mobile operator there to verify that they are also seeing the abuse level off.

Denniss removed a subscriber: Denniss.Apr 24 2016, 10:55 AM
I will be in touch with the mobile operator there to verify that they are also seeing the abuse level off.

That will be a good idea to know their perspective as well on this.

Yurik merged a task: Restricted Task.Apr 27 2016, 6:14 PM
Yurik added subscribers: csteipp, BBlack, akosiaris, MaxSem.
Yurik added a comment.Apr 28 2016, 2:32 PM

Social aspect: as discussed in Facebook WP weekly group, there is an article on the topic.

Social aspect: as discussed in Facebook WP weekly group, there is an article on the topic.

The article... speechless...

Hello,

I'm a radio producer with the BBC and am hoping to speak to a member of this group about the Wikipedia Zero piracy issue.

Would anyone be happy to do a pre-recorded telephone interview to be broadcast on the BBC World Service?

Many thanks,

Sam

Hello,

I'm a radio producer with the BBC and am hoping to speak to a member of this group about the Wikipedia Zero piracy issue.

Would anyone be happy to do a pre-recorded telephone interview to be broadcast on the BBC World Service?

Many thanks,

Sam

Probably not a good place for this here, though you can check https://wikimediafoundation.org/wiki/Press_room for press contacts.

Thanks for getting back to me Matthew - I've put a request in to the Wikimedia foundation as well.

But I'd still like to to hear from someone involved in the day to day work of preventing piracy on the platform.

If anyone is happy to speak over the phone, I'd be very grateful.

Pokefan95 moved this task from Incoming to Backlog on the Commons board.Jun 8 2016, 9:20 AM
Gunnex added a comment.Jun 8 2016, 6:35 PM

! In T129845#2207586, @Gunnex wrote on 16.04.2016:
Getting close to 300 accounts now (currently 291)...

Update: Per User:NahidSultan/Bangladesh Facebook Case/Accounts = 599 accounts...

See also GitHub: access restrictions =

The idea is to restrict access to video2commons only for a certain user group, based on status (auto-confirmed) or on (living) user edits (> 20, 50, 100, X) as all uploads are coming from fresh registered users (rarely: or from 0-edit sleepers).

and

In T129845#2365389, @Gunnex wrote on 08.06.2016:

! In T129845#2207586, @Gunnex wrote on 16.04.2016:
Getting close to 300 accounts now (currently 291)...

Update: Per User:NahidSultan/Bangladesh Facebook Case/Accounts = 599 accounts...

676 accounts...

MarkTraceur lowered the priority of this task from "High" to "Normal".Dec 5 2016, 9:29 PM
MarkTraceur added a subscriber: MarkTraceur.

Lowering priority, because it seems like no progress has been made for a while, but also, I'm not sure exactly what solution is being sought here. Can someone elaborate on what technical steps should be taken to prevent this, apart from the existing AbuseFilter solution (which is already implemented)?

MarkTraceur moved this task from Untriaged to Tracking on the Multimedia board.Dec 5 2016, 9:29 PM

The AF only marked uploads for ease of review, and may have indirectly discouraged the copyvios.

Regarding Bangladesh Facebook Case: AFAIK, after WMBD's awareness post and reached out to several Facebook groups, some of them quit. Also v2c became closed to them (edit count + user age requirement added, iirc), forcing them to upload the files in the old way. @Gunnex and @NahidSultan should know more on this case.

Unfortunately, a similar case has just begun. Myanmar Facebook groups has been observed to abuse T48921: Refuse uploading JPEG files with extra junk at the end. to share files also via zero-rating. Perhaps another task should be filed about this.

This page has been updated this month: https://commons.wikimedia.org/w/index.php?title=User%3ATeles%2FAngola_Facebook_Case&type=revision&diff=236118306&oldid=194233576

It seems that those uploaders just find other projects where they can publish copyrighted files when we block them in the first projects where they tried before. De.wiki, ca.wiki, br.wikimedia, the wiki for 2016 Wikimania, are some of the projects currently used.

Can I recommend disallowing uploads other than images for unconfirmed users on projects other than Commons? I hope this is not stepping on their toes, since it seems to be the best solution.

IMO it would have the least impact, since we are handling the issue with more precision on Commons. (i.e., new non-wiki Zero users still can upload these media).

Can I recommend disallowing uploads other than images for unconfirmed users on projects other than Commons?

Local upload already requires autoconfirmed to encourage uploading to Commons. We should avoid making user permissions overly complex; we can and should disable local uploads altogether on wikis where they're not monitored (hopefully de.wiki and ca.wiki are not such).

FYI we're seeing yet another influx of uploads from another Wikimedia Zero project. I now count Wikimedia Zero projects in 4 different countries as having coordinated file sharing campaigns on Commons.

The latest is using embedded data inside PNG, PDF, and OGG files.

The latest is using embedded data inside PNG, PDF, and OGG files.

Are there bug reports already against MediaWiki-Uploading to detect those?