Page MenuHomePhabricator

Surface Wikipedia Zero traffic in AbuseFilter
Closed, ResolvedPublic

Description

Due to abuse from some subscribers of a few Wikipedia Zero partners, our editors need additional tool support to identify possible abuse or assist with new editors who do not understand Wikipedia editing.

This feature assists by flagging any edits that come from a Wikipedia Zero partner network.

You should be able to filter actions in AbuseFilter based on the traffic coming from a Wikipedia Zero partner by an understandable variable name such as "zero".

This should initially be a true/false value covering all zero partner traffic in order to help protect privacy.

Details

Related Gerrit Patches:
mediawiki/extensions/WikimediaEvents : wmf/1.27.0-wmf.19Add "user_wpzero" AbuseFilter variable
mediawiki/extensions/WikimediaEvents : masterAdd "user_wpzero" AbuseFilter variable

Event Timeline

DFoy created this task.Mar 29 2016, 11:22 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMar 29 2016, 11:22 PM
DFoy added a comment.Mar 29 2016, 11:32 PM

Also, this is a first step to address the problem. If this first step turns out to be insufficient, future improvement requests would likely include reducing the partners flagged from all to a smaller set, and/or flagging zero rated traffic from the FB/Freebasics proxy. This is just FYI in case these plans can assist with the coding approach for this task.

@BBlack, apparently there is a header sent to the backend app servers when the request is from a zero range. Do you know the exact header name, or can you point to where we that is added?

BBlack added a comment.EditedMar 30 2016, 11:55 AM

@csteipp, all the cache clusters send X-Carrier. That info is derived from the carriers.json netmapper data, which is zero-specific (we only list networks of zero partner carriers there). So, if X-Carrier is set at all, it's a request from a Zero partner network, and the data is the MCC-MNC code of the actual mobile carrier in question. So it looks like X-Carrier: 123-45.

Also, we send X-Carrier-Meta (if set - it's optional and only ever present if the main X-C is also present), which is any other metadata in the database per-carrier. Currently only a few carriers have metadata flags, which are currently wap, residential, or test, but the scheme allows for other flags to be set (modulo that not breaking assumptions of other Zero-related software), which might be useful to differentiate flagging of specific zero carriers only as @DFoy mentions above. If we end up supporting multiple flags per carrier, they'd be separated with | inside the header, e.g. X-Carrier-Meta: wap|residental|flagedits.

@BBlack, thanks! @Legoktm, anything else you need to get started?

Change 280468 had a related patch set uploaded (by Legoktm):
Add "user_wpzero" AbuseFilter variable

https://gerrit.wikimedia.org/r/280468

Legoktm triaged this task as High priority.Mar 30 2016, 6:32 PM
Gunnex added a subscriber: Gunnex.Apr 2 2016, 6:35 PM

Change 280468 merged by jenkins-bot:
Add "user_wpzero" AbuseFilter variable

https://gerrit.wikimedia.org/r/280468

Vituzzu added a subscriber: Vituzzu.
matmarex added a parent task: Restricted Task.Apr 4 2016, 2:05 PM

Confirmed patch works (https://test.wikipedia.org/w/index.php?title=Special:AbuseLog&wpSearchFilter=161, my IP was temporarily added to a test carrier), I'll backport it in SWAT tomorrow morning.

Change 281867 had a related patch set uploaded (by Legoktm):
Add "user_wpzero" AbuseFilter variable

https://gerrit.wikimedia.org/r/281867

Change 281867 merged by jenkins-bot:
Add "user_wpzero" AbuseFilter variable

https://gerrit.wikimedia.org/r/281867

Legoktm closed this task as Resolved.Apr 6 2016, 5:17 PM

The user_wpzero variable is now available for use on all wikis. It's a boolean option indicating whether the user is connecting over Wikipedia Zero, it doesn't distinguish between which carrier is used or anything.

Hi! Thx for the filter.
How I could track Wikipedia Zero uploads (on Commons) via Quarry? I checked +/- 24 h of Commons's abuse filter and +/- 98 % uploads are bad/nonsense/copyvios/out of project scope/etc.. As result, I would like to have something similar to Cross-wiki upload from pt.wikipedia.org (07.03.2016) (btw: with similar bad ratio via local Visual Editor) on a daily base — and comparing it on a same daily base with deletions, like Cross-wiki upload from pt.wikipedia.org (07.03.2016) (deleted) (in this example: 40 uploads and 36 deletions: bad ratio = 90 %). The goal is to monitoring the amount of Wikipedia Zero uploads at Commons, comparing it to related deletions and do statistics with "bad ratios".
The main issue (upload of copyrighted complete films/videos/musics etc. via the "Bangladesh Facebook Case") may be not covered by this filter because even today I (and other users) noticed several related uploads (shared instantly by Bangladesh Facebook groups) which were not covered by this abuse filter --> most likely uploaded via paid flat mobil contracts (with better bandwidth etc., consdering also uploads of complete films involving hundreds of megabytes till +1 GB of data).
Nevertheless, the filter is valuable to analyze the behaviour of Wikipedia Zero user. Currently, I would say: Commons = used by Wikipedia Zero users as a free image hoster for Facebook like profiles, ego-spam on local wikis, nonsense/attack images or illegal, spontaneous grabs from Internet / social media. In other words: the "educative" motive steps into the background. Or shorter: they don't care [about copyrights etc.]. It's something for free. It's Wikifacebook.

He7d3r added a subscriber: He7d3r.Apr 11 2016, 11:24 AM
Az1568 added a subscriber: Az1568.Jul 28 2017, 8:18 PM