Page MenuHomePhabricator

Copyvios tool: investigate/block suspicious web traffic
Open, Needs TriagePublic

Description

Over the past 2+ months, I've had unusual web traffic to my copyvios tool. Currently I've blocked it with a uwsgi rule so the requests 403 immediately, but they fill up my logs with many thousands of junk entries a day, and the rule could easily be worked around if whoever was sending the requests tweaked the parameters a bit, so I would like to see if we can block this at a different point in the stack or at least think a bit about what's going on.

Each request is to a URL like https://copyvios.toolforge.org/?lang=en&project=wikipedia&oldid=887576204&action=compare&url=google.ee. The revision ID in the "oldid" field is constant (this is what I am blocking based on) but the "url" field varies. "http://hasty.ai" often appears in either the HTTP referrer or somewhere else in the request headers, but not always.

For example:

[Tue May  4 05:44:01 2021] GET /?lang=en&project=wikipedia&oldid=887576204&action=compare&url=google.ad => generated 0 bytes in 0 msecs (- http://hasty.ai HTTP/1.1 403) 2 headers in 89 bytes
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.139 Safari/537.36
Referer: http://webservices.icodes.co.uk/transfer2.php?location=https%3A%2F%2Fcopyvios.toolforge.org%2F%3Flang%3Den%26project%3Dwikipedia%26oldid%3D887576204%26action%3Dcompare%26url%3Dgoogle.ad - http%3A%2F%2Fhasty.ai

In P15679 I've pastebinned a sample of how these requests look on my end. The logs are from May 4 but the traffic is the same today.

In particular, it's not clear to me what hasty.ai has to do with this at all (or if some service they provide is compromised and being used to DOS me?). I can't see IPs so I have no clue where this is coming from. I don't understand why hasty.ai sometimes shows up in the uwsgi log as part of the protocol (%(proto) is normally HTTP/1.1 but with this traffic is sometimes - http://hasty.ai HTTP/1.1). This almost sounds like it's exposing a bug in uwsgi's request parsing or log formatting but I'm again unable to really trace this down as I don't have an obvious way to see the raw HTTP traffic and my attempts to simulate a garbled HTTP request with telnet did not produce any logs like this.

I get a few of these requests per second, whereas normal tool usage might be a few requests per minute on average. If unblocked, they flood the tool.

Longer term I will require OAuth to use the tool, which will help to block this sort of thing more securely, but it's not ready yet and it won't stop the requests from coming to my uwsgi process either.

Related Objects

Event Timeline

In particular, it's not clear to me what hasty.ai has to do with this at all (or if some service they provide is compromised and being used to DOS me?). I can't see IPs so I have no clue where this is coming from.

The pattern is indeed quite strange, I took a look today morning (around 6 UTC) and at that time today's nginx logs (which we rotate at midnight UTC) has 89 unique IP addresses when grepping the log for "hasty.ai". Most of those look like cloud/hosting providers of some sort, but there are some that look like residential addresses and all of them are spread all around the address space. No clear user-agent patterns either (just a bunch of different browser-like UAs), but most of the requests have a referrer header with the target toolforge url on it as a get parameter, but I tested a few of those and they didn't immediately look like open redirects.

I don't understand why hasty.ai sometimes shows up in the uwsgi log as part of the protocol (%(proto) is normally HTTP/1.1 but with this traffic is sometimes - http://hasty.ai HTTP/1.1). This almost sounds like it's exposing a bug in uwsgi's request parsing or log formatting but I'm again unable to really trace this down as I don't have an obvious way to see the raw HTTP traffic and my attempts to simulate a garbled HTTP request with telnet did not produce any logs like this.

This is what it looks on our front proxy: GET /?lang=en&project=wikipedia&oldid=887576204&action=compare&url=google.cl - http://hasty.ai HTTP/1.1