Page MenuHomePhabricator

Poll stewards and admins on spam-fighting efforts
Closed, ResolvedPublic

Description

Greetings Stewards et al-

In an effort to better catalog certain data around manual, on-wiki spam-fighting efforts, the Security-Team had a few questions:

  1. How much time do you spend combatting spam-related issues on-wiki? This can be an extremely rough estimate, we're just looking for some measurement of time over time period, e.g. 2 hrs per day, 10 hrs per month, 90 days per year or even "half my time performing steward-related activities".
  2. Which tool(s) do you currently use the most when dealing with these issues? This can include things like MediaWiki's built-in blocking/suppression capabilities, various deployed anti-spam extensions, MediaWiki configurations and even third party of manual processes outside of what Wikimedia projects currently offer.
  3. Outside of vastly improved Captchas or similar anti-spam/anti-automation tooling on the projects, what, in your opinion would be the most valuable feature or tool in dealing with spam and similar incidents on-wiki?

Notes: I understand that many of you have likely answered these questions at least a few times before, within the context of various wish-list requests, etc. It can be very frustrating when these questions have been answered and little progress is seemingly made. We're hoping to change that. I think we're also willing to at least temporarily make this task private if any of the responses might become a bit sensitive.

Event Timeline

sbassett triaged this task as Medium priority.Jun 11 2020, 4:04 PM

I'm not a steward, but I have spent a lot of time over the years fighting both link spam and article spam over the years.

  1. 40% of my admin time is spent dealing with spam. I used to combat link spam, but that fight has been lost long ago. Nowadays I delete UPE.
  2. is a flawed question. We use everything we can get our hands on. If you want the most frequently used tool, that would be either the flamethrower (delete) or banhammer (local block, global lock). The most effective tool is the spam blacklist. We also use the Linkwatcher feed and database to find spam.
  3. All of them suck. Personally I would overhaul the external links table to separate out the domain, subdomain and protocol. The idea is to make it sufficiently performant to perform queries that count the times domains inserted are used on each edit. Then block non-autoconfirmed editors from inserting domains that are not used anywhere else on that wiki. The reformed table should be used to fix Special:Linksearch, expose the counts to AbuseFilter etc.

@MER-C Thanks for the feedback! While we're mainly focusing on current global stewards, feedback from any sysop (e.g. @Billinghurst) or privileged user routinely aiding in these efforts is appreciated, even if it's highly critical.

sbassett renamed this task from Poll stewards on spam-fighting efforts to Poll stewards and admins on spam-fighting efforts.Jun 11 2020, 9:28 PM
sbassett added a project: user-sbassett.
sbassett moved this task from Backlog to In Progress on the user-sbassett board.

(not a steward)

  • On days when I'm bored and look for spam, there is always spam to be found and tagged for speedy deletion, and spambots to report to block (where I can) and report to SRG
  • On days when I don't go looking for it, probably around 10% of my time is related to anti-spam
  • Tools used: AbuseFilter, SpamBlacklist
  • Tools that would be helpful:
    • AbuseFilter renders all interwiki links as bluelinks, meaning that, to tell if a page was created by a spambot and still exists, I need to visit each page - if redlinks could be added for links to pages on other wikis that don't exist (eg, when looking at https://meta.wikimedia.org/wiki/Special:AbuseLog/919982 - which wasn't disallowed by that filter hit - the link to the page that may have been created is a blue link. Though the edit was disallowed by a different filter, I don't know that and so go and check manually. If it was styled as a redlink, it would save some time when looking at the full abuse log or individual hits)

(steward)

Thanks for opening this poll regarding this subject:

Question #1: I was more involved in antispam than nowadays. Similarly to MER-C above, I had to make a decision. The days I'm on it, 100% of my time is locking spambots and deleting spam. In fact a quick look at our global account logs shows that most locks we do per day are related to spam and spambots. The attempts to spam our sites are continuous as well. Please take a look at the global abuse filter log and the spam blacklist log (this one ain't global, you'll need to fetch it individually for each wiki). This IMHO shows that our current tools are defective at preventing spam.

Question #2: I rely on SpamBlacklist, TitleBlacklist and AbuseFilter to detect spam (either attempted or performed). CheckUser in order to identify further spambots registered or toxic networks and GlobalBlocking to block them to avoid further spamming. I guess some spam is also stopped by the ConfirmEdit (CAPTCHA) extension.

Question #3: I am not a software engineer of anything like that so it's difficult for me to answer this question. I have some ideas though.

I may expand this list in the future.

(steward)

  1. I'm personally not a steward focused at spambots, but the majority of reports we get at SRG and IRC, and what we do ourselves, is related to spam. 40 % of my steward time would be spambots at the bare minimum.
  2. CheckUser is the most important tool to combat spamfarms, as well as related databases (WhoIS and http://proxycheck.io/ to check for spambot farms). Regarding noticing spam, I rely at noticing spam while patrolling for xwiki vandalism (https://tools.wmflabs.org/swviewer/ is a great tool for that), and fetching related activity by using COIBot's database (#wikimedia-spam-t), which allows me to get users inserting same links.
  3. I can imagine a tool comparing frequency of link additions. If someone (especially new) inserts a link to many articles, this would be a sign it's illegitimate at least. Tools that automatically exclude blacklisted IPs and proxies from editing are also helpful. Autoblocks would be beneficial, but I think expanding globalblocks to accounts would be more benefitial.
    1. Not related to spam fight directly, but to consequences. If an ISP is abused frequently enough, we can just block it all. The majority of queries sent to our email address is about collateral damage. Tools to automatically assess data we look for (make sure user is not using a VPN without knowing, know a need etc) could save another bunch of time.
  1. I used to be more active in anti-spam, but I have largely given up due to the lack of tools to be able to effectively mitigate them. How much time you spend depends on how in-depth you go. You can easily just lock the lists of 20+ spambot accounts posted on SRG, without doing any checks or blocking the underlying IPs/ranges. It takes much longer if you're looking into the underlying IPs/ranges, and checking for potential collateral damage for blocks.
  1. The abuse filter log is very important, but that's only the tip of the iceberg. Checkuser is a must because you can look at the ranges and the networks - the bad networks become familiar quite quickly. @SQL had made IP Check which was/is quite effective at providing additional information about a specific IP/network. I'm not familiar with the situation, but something led to this message indicating that this will no longer be maintained. With the current state of things, IP masking would completely cripple any anti-spam effort. A large amount of spambots that went undetected are found because they share a range/IP with other spambots.
  1. There are many things that could be done to improve existing tools/develop new tools. Awhile ago I filed a task with WordPress about how unmoderated comments can be used to spread spam links. It would be useful for there to be a relatively simple, user-friendly way to be able to search the database for any links matching a regex such as (\?|\&)unapproved=[0-9]+&moderation-hash=[0-9a-z]+ but there is no way for a person with average technological skill to do so.

It would also be useful to be able to do batch checkusers, because when you have a list of 50+ spambots it's incredibly time consuming to do each individually. What kind of format I'm talking about is you would input a list of usernames, like Special:MultiLock, a reason, and it would generate out a list of IPs and also if there were any other accounts on those IPs. At this second page, you could have the option to check these IPs etc. making a sort of "tree" of the network these spambots are using.

Another incredibly time-consuming thing is investigating the URLs added to see if they should be blacklisted or not. There are tools that attempt to address this, and this is not a lack of appreciation for those - but it would be useful if there were a way to search for any additions of "https://hereismyspamsite.com" and it would include both deleted and live contributions from all projects. COIBot tries to do this, but it misses sometimes with live contributions and doesn't do deleted AFAIK.

There's also no built-in tool to do more than one global/local block at once. We have to use javascript tools are broken or will break and aren't actively maintained. @Urbanecm was kind enough to help me set-up a Python script to block globally and on meta using the API, but this is not a solution suited for the typical end-user.

Most of these issues wouldn't be issues if we had a reasonable spam-blocking mechanism like MediaWiki-extensions-StopForumSpam on Production like @MarcoAurelio suggested.

Also echoing some of @MarcoAurelio 's suggestions, enwiki-based tools ProcseeBot, SQL's formerly generated non-blocked compute hosts list, and @ST47 's ST47ProxyBot would be an asset to tackling VPN/Colocation/Botnets/Open proxies which are frequently abused by spambots and LTAs alike.

I also agree with T19929: CentralAuth account locks should trigger global autoblocks needing implementation.

I appreciate +sub, but I don't think that this is an appropriate place to discuss what's going on with my tools. Please feel free to reach out to me on IRC if you'd like to talk.

Thanks everyone for the feedback thus far. The Security-Team will plan to leave this task open for a couple more weeks if anyone would like to provide more feedback. @SQL - if you'd prefer, you can reach out to us via security-help@wikimedia.org or within #wikimedia-security on irc.

Never been a steward however for something over 10 years dealing with link placement has taken up much of my time on Foundation projects and still does. There are days when I get little else done. These days much of what I find arrives on Commons usually as user pages however there is also quite a lot of spam that arrives in or around images. I've even seen it coming via EXIF in images.

When I have the time I will take a look at the domains via the COI bot on Meta. That and cross wiki search tools for links are probably my main resources.

I have been an admin on Meta and Commons for well over 10 years (& from time to time had rights on other projects too).

For user accounts spamming on multiple wikis, central auth can be used to track down which wikis they have edited, but there is no fast tool for ips - guc works, but its slow

Thanks all for the responses! Going to resolve this for now.