Page MenuHomePhabricator

Cleanup phabricator.wikimedia.org uploaded files, WP zero abuse
Closed, ResolvedPublic

Description

Discussion: https://phabricator.wikimedia.org/conpherence/567/.
Willingly no pings these accounts

Please delete APK files and draft tasks from these accounts and block them : Moncifsinawi270, Badr.ou, Eddaouyhacker, Hichambravo991 ...
Keep an eye on / block if you want Younes19956

Batch deletion query: https://phabricator.wikimedia.org/file/query/NccGO.NS4ZH3/#R

It probably exists other files that I can't see or read due to Policies.

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Closing and adding Wikimedia-Incident as this was follow-up/action for the spam incident. In progress incident report at https://wikitech.wikimedia.org/wiki/Incident_documentation/20170617-Phabricator-spam

(assigning to @Volans as I believe he's who did most of the deletion)

@Aklapper @mmodell

Today @MaxSem got another report of an abuse file in Phabricator. After checking it I found that was uploaded last week during the incident and I coudn't find it in Phabricator UI in the file search, but I could confirm it was there in the DB.

After some more digging I found other 49 files related to this incident and I went ahead and deleted those ones too.

I'm afraid though that I've found that we have a lot of dangling files belonging to disabled users, see below:

root@MISC m3[(none)]> select count(*) from phabricator_file.file f left join phabricator_user.user u on f.authorPHID = u.PHID where u.isDisabled = 1;
+----------+
| count(*) |
+----------+
|     2904 |
+----------+

I also thought if those might be the user's avatars saved as files, but just few users have only one file (that might be that one maybe, I didn't check if Phabricator saves avatars in a different table) most of the files belongs to few users with a lot of files each.

Run this query for example to see them:

select u.userName, count(*) from phabricator_file.file f left join phabricator_user.user u on f.authorPHID = u.PHID where u.isDisabled = 1 group by u.userName;

Those might also be what Phabricator calls automatically uploaded files, but anyway, are still available for download AFAIK.
It is probably worth to split this and open a separate task for those other dangling files, but I wanted to report them here now that the topic has some attention.

Thanks! (Note that adding AND f.isExplicitUpload = 1 to that query might make numbers look less a bit less concerning.)

@Aklapper not really, same order of magnitude, see below. Also what is exactly the difference? I've opened some of the isExplicitUpload = 0 and are usually images that could be spam as well.

root@MISC m3[(none)]> select count(*) from phabricator_file.file f left join phabricator_user.user u on f.authorPHID = u.PHID where u.isDisabled = 1 AND f.isExplicitUpload = 1;
+----------+
| count(*) |
+----------+
|     1945 |
+----------+

I'm wondering if when disabling a user Phabricator should allow to optionally completely deleting all his contributions.

@Volans: Ah, sorry. I had thought that it resembles the "Upload Source" setting when you go to https://phabricator.wikimedia.org/file/query/all/ and click "Edit Query". :(

@Aklapper, yes AFAIK that field in the DB respect the query filter Upload Source in the file query page. But from what I'm seeing, that determines only from where the file was uploaded (I guess if from the file upload page or indirectly from a drag and drop in task comment, a Paste, etc...).

@Aklapper @mmodell Please have a look on T169502. I can't edit the content of this task, it tells me "You do not have access to any forms which are enabled and marked as edit forms.", so I've upgraded the task as a security issue until you'll look at it.

While we're at it, please also delete the files uploaded by https://phabricator.wikimedia.org/p/Bobitoptop/, which registered through Wikitech.

@Framawiki @Mainframe98 thanks for letting us know. I've disabled the two users and removed their files. I didn't touched the task T169502, as it's harmless at this point, I'll leave it to Phabricator admins.

@Aklapper @mmodell the cleaning effort is clearly not working! After my last full cleaning of recent files, there are already 134 files uploaded during the last few days by users that are disabled now. And the ~3k previously reported ones are still there.

root@MISC m3[(none)]> select count(*) from phabricator_file.file f left join phabricator_user.user u on f.authorPHID = u.PHID where u.isDisabled = 1 and f.dateCreated > 1497010243;
+----------+
| count(*) |
+----------+
|      134 |
+----------+

@Volans: Some possibilities which have been discussed for further countermeasures:

Moderately disruptive possibilities:

  • Require a certain amount of activity before allowing file uploads.
    • This will require some amount of custom code in phabricator
    • We would need to determine what activities would be sufficient to qualify for upload privileges.
    • It might be difficult to define a set of activities which is sufficient to deter abuse without imposing too much on legitimate users.
  • Further restrict the file size limit
    • Uploads are already limited to ~8 megabytes, we could go lower

More extreme, maybe too disruptive:

  • Disable uploads completely
  • Start blocking IP ranges.

@Volans: We could set up a cron job to delete files that match your query?

@mmodell perhaps we should create a new task to discuss on this subject ? Or create an RfC on mw.org ?
But perhaps too we should take an immediate and temporary action, @Volans says that the problem is still present here right now.

@Volans, @Framawiki: The files in question have been removed.

I am embarassed to admit to how I did it, and I will be the first to stress that we we need a much safer / cleaner way to do it.

With all of that out of the way, here is what I did:

  • I ran the query that @Volans mentioned above but selecting f.phid instead of count(*)
  • I logged the matching phids into a text file, then I ran that through xargs to phabricator's remove destroy command:

cat phids.txt | xargs -n 1 sudo /srv/phab/phabricator/bin/remove destroy --force

@mmodell thanks for cleaning those too, but I don't think that this method can be applied in general. There are ex-legit users that might be suspended (we have some of those) and their uploads would be deleted too although legit.

On a side note, I'm still waiting an answer for the use of the destroy script. From my previous email on the same thread few days ago:

Actually I was advised few months ago to not use it because it might leave inconsistencies in the DB, and that is also what is perceived reading the message that the script shows (after an ascii-art skull):

IMPORTANT  DELETING OBJECTS OFTEN BREAKS THINGS

Destroying objects may cause related objects to stop working, and may leave
scattered references to objects which no longer exist. In most cases, it is
much better to disable or archive objects instead of destroying them. This
risk is greatest when deleting complex or highly connected objects like
repositories, projects and users.

Take also into account that files in Phabricator are a bit special, they cannot be archived and the owner has a delete button, while the vast majority of objects in Phabricator cannot be deleted but can be archived/hidden/closed.

So, if the situation has changed, and is now safe to use the remove script with files and it behaves exactly as the Web UI, please let's make it part of the documentation. It's obviously much quicker than my hacky approach that dates back to a few months ago when we had another abuse situation although in that case the file to delete was only one, but was during a weekend too ;)

In a test run of the remove script with --trace, it seems that actually does a series of queries to search and delete related references, but it would be nice to have this doubt cleared for everyone with an authoritative answer, maybe from upstream.

The destroy command does a fairly thorough job of cleaning up references. The warning about breaking things is generic and it applies because you can delete all kinds of objects with that command.

  • deleting projects is demonstrably dangerous (I accidentally wiped out a project and had to spend an hour restoring all of the data)
  • deleting users leaves dangling references all over the place which mention their username.
  • I'm sure there are other things which are only partially cleaned up by destroy

Files, however, seem like the ideal use for the tool and not particularly problematic. The worst case is some remarkup which mentions the file like {F123456} which results in either a link to a 404 or a plain text reference without a link, just like you see here in this comment.

@mmodell thanks for cleaning those too, but I don't think that this method can be applied in general. There are ex-legit users that might be suspended (we have some of those) and their uploads would be deleted too although legit.

I completely agree, we would need some way to differentiate spam accounts from legit accounts. Maybe we could use the account's age? Recently created accounts that are disabled are almost certainly spam.

On the general situation, not on deleting content of disabled accounts:

It's not a long-term strategy to continue enabling auth.approval on weekends (I've done this for the last weekends) and my/us checking IPs of every new account. Plus some spammers have started to use proxy IPs in the USA or EU to register their accounts and after approval they use WP0 IPs to upload copyrighted content (like twice today already).
Plus spammers might start to register accounts, leave them dormant for a while, and then use them.
For the records, some spammers also recently started to bind their Phab account to LDAP instead of SUL/OAuth.
We cannot watch https://phabricator.wikimedia.org/people/logs/ for IPs and https://phabricator.wikimedia.org/people/ for new accounts all the time as our human resources are limited.
Discussion in the thread "Phab WP0 file upload spam" on the ops@ mailing list in June 2017 has not led to other options I consider feasible. For example, excluding certain MIME types would also be whack a mole, and excluding phab.wmfusercontent.org from WP0 is "not really [doable] in a comprehensive fashion".
Been thinking of checking uploaded files against a list of sha256 sums (afaik some folks on Commons do in order to quickly identify some warez uploaded via WP0) but that would also require us to write and maintain a gross hack plus does not avoid any other vandalism (like mass-creation of useless tasks etc that we've experienced).
Further file size limitations will just result in smaller chunks in ZIP files (seen that already for 2MB files) and will hurt valid uploads (e.g. videos how to reproduce a bug).
So I currently do favor IP blocking of Moroccan and Algerian mobile IPs in Phab, even if that will be broader than we'd like. And we can already do this easily.

e.g. videos how to reproduce a bug

Why can't they upload to commons, especially considering J18?

Videos and other bigger files probably should. But small screenshots and other similar files can be in phabricator instead as there is no chance they would be used elsewhere too.

Why can't they upload to commons, especially considering J18?

In some cases certainly fine but not in case of embedded mockups / design iterations etc...

Is it just possible to block an ip range to download a file on phab ? A hook in this part of the program ?

Plus spammers might start to register accounts, leave them dormant for a while, and then use them.

This is what I worry will limit the effectiveness of anything short of the all-out block

Is it just possible to block an ip range to download a file on phab ? A hook in this part of the program ?

Even if its not possible in Phabricator, it can be easily done via web server config. I agree that would be a more appropriate response.

Plus spammers might start to register accounts, leave them dormant for a while, and then use them.

This is what I worry will limit the effectiveness of anything short of the all-out block

Blocking pirate accounts is a never-ending whack-a-mole game, they can mimic all relevant characteristics of a real user (non-Zero IP via proxy, small uploads, arbitrary file types...). We need to block downloads, they have to use Zero IPs for those so they are easy to identify.

Change 363264 had a related patch set uploaded (by MaxSem; owner: MaxSem):
[operations/puppet@production] Block WP Zero users from accessing Phabricator uploads

https://gerrit.wikimedia.org/r/363264

What do people think of my patch above?

Change 363264 merged by Ema:
[operations/puppet@production] Block WP Zero users from accessing Phabricator uploads

https://gerrit.wikimedia.org/r/363264

Note: user https://phabricator.wikimedia.org/p/D3r1ck01/ is asking why they cannot access Phab at all, and it looks like they are caught up in the ban that would be lifted here https://gerrit.wikimedia.org/r/#/c/363356/. @D3r1ck01 reported their IP as 154.72.169.184. They noted a few other people in their social circle seem to be having the same issue. Since this task is closed I am making the assumption https://gerrit.wikimedia.org/r/#/c/363356/ is fine to merge but I'm going to wait for confirmation from @20after4 and/or @Aklapper that they are around to be sure.

I've disabled (if not already) and removed files for the following users:

Soufianehamouda
Houssamista
Marama12
Oussama177

Change 367422 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] varnish: reject phabricator uploads from WP0 users

https://gerrit.wikimedia.org/r/367422

Change 367422 merged by Ema:
[operations/puppet@production] varnish: reject phabricator uploads from WP0 users

https://gerrit.wikimedia.org/r/367422

Got yet another spammer, with a familiar name too, registered via wikitech this time: https://phabricator.wikimedia.org/p/Younes2001/
Please also check the users they assigned to their tasks and files.

Yet another: https://phabricator.wikimedia.org/p/Darcula/
At this point I really want some kind of central report point or something. This is becoming ridiculous.

I think it can be reported here. By the way, what about blocking using of a phab account from ALL WP zero IPs? We can have exemptions if needed.

The block is happening on the varnish side, so I don't imagine adding exemptions being that easy.

Those users are probably not from wikipedia zero. Anyone outside from wikipedia zero can upload files.

Is there a task or project I should use for those cases instead?

There's currently no task but you could open one though there's not much anyone can do now as those users should be blocked on mw instead which should then stop them from logging into phabricator. (quickest way possible as there's more users on mw that can block then phab)

Is there a task or project I should use for those cases instead?

But what to discuss there what's not already discussed here? We do not plan to disable uploads in general. :)

I think it can be reported here. By the way, what about blocking using of a phab account from ALL WP zero IPs? We can have exemptions if needed.

I'd prefer to go per mobile provider(s) instead. Also see
https://gerrit.wikimedia.org/r/#/c/363264/
https://gerrit.wikimedia.org/r/#/c/367422/
https://gerrit.wikimedia.org/r/#/c/363001/
https://gerrit.wikimedia.org/r/#/c/363356/

ACdreamer looks unrelated to WP0 abuse. The texts inside the SQLs are quite valid Chinese, and there is no Wikipedia Zero zero-rating in China, AFAIK.

Is there a task or project I should use for those cases instead?

But what to discuss there what's not already discussed here? We do not plan to disable uploads in general. :)

I meant that more in the sense of a reporting point for spam/vandalism on Phabricator (that isn't covered by this task), especially related to uploads. Of course we shouldn't disable general access to uploads, that would be counter productive to the usage of Phabricator.

more in the sense of a reporting point for spam/vandalism on Phabricator (that isn't covered by this task), especially related to uploads.

Which problem does that solve? Anyone can check the latest file uploads anyway.
If there is spam, someone will handle it. If something is urgent, there is #wikimedia-devtools on Freenode IRC.

What about moving phab to labs (if labs is non-zero-rated)?

To quote Keegan, note that the uploads of IP protected media are not necessarily coming from WP0, but it is large groups of WP0 users that are organized to download said material.

What about moving phab to labs (if labs is non-zero-rated)?

We (RelEng) won't host Phabricator in labs (Cloud Services). This is a service where the users expect production level uptimes (WMCS doesn't really have the same guarantees). :)

Change 389888 had a related patch set uploaded (by Greg Grossmeier; owner: 20after4):
[operations/puppet@production] Narrow the range for this ban as it is affecting users who are not on a zero rated host.

https://gerrit.wikimedia.org/r/389888

A new account was created tonight to send files. Please delete them.
It calls into question the way that this instance of phabricator works, and its protective template. How do I report vandalism to administrators? Is there any way to have file deletion or other rights that can help me when I find a case?

A new account was created tonight to send files. Please delete them.

Done, thanks.

It calls into question the way that this instance of phabricator works, and its protective template. How do I report vandalism to administrators?

#wikimedia-devtools on IRC, or create a task, I'd say.

Is there any way to have file deletion or other rights that can help me when I find a case?

That would require admin rights, I'm afraid.

I banned them but had not had a chance to purge the files. Thanks.

I ran into this block yesterday while working from Morocco, I'm not sure if the block might be a little broad, but out of 6 internet connections that I have used all appear to be in the blocked range.

So I currently do favor IP blocking of Moroccan and Algerian mobile IPs in Phab, even if that will be broader than we'd like. And we can already do this easily.

Was the assumption that all of these IPs were mobile IPs? If so that doesn't appear to be the case.

We will be able to remove the Wikipedia zero block this year as Wikipedia zero is being discontinued this year.