Page MenuHomePhabricator

iabot flagged as a malicious agent by external sites
Closed, ResolvedPublic

Description

In recent weeks I've gotten multiple notices from external sites about a bitnet infection in our cloud. In particular they're reporting an agent that hits web page after web page in quick succession (which their tooling reports as a probe for vulnerability).

In one case the 'attacker' was clearly identified as iabot; in the other it was less explicit but still looks to me like iabot activity.

I've responded to the reporters and explained that iabot is a useful crawler and not an attack vector. Nevertheless, it would be nice to not get these reports... perhaps iabot could throttle its loads, or interleave multiple websites so that it doesn't hammer on any one webserver and get flagged.

Event Timeline

IABot's IP address is 208.80.155.236 though.

Ah, sorry, that was a different one:

Sir / Ma am,

The NCCIC is requesting assistance in verifying possible malicious activity being hosted on a system registered to you that may be affecting visitors and we would greatly appreciate your assistance in investigating such activity. The following information was provided by a trusted third-party to help resolve this issue. Please note that this is the extent of the information and USCERT does not have any additional information to provide:

"With an internationally coordinated operation, law enforcement agencies took down the 'Avalanche' server infrastructure used for hosting various botnets. Additional information is available at: https://www.us-cert.gov/ncas/alerts/TA16-336A

In the course of this operation, domain names used by malware related to those botnets for contacting command-and-control servers have been redirected to sinkholes.

Please find below a list of affected hosts in your country. Each record includes the IP address, a timestamp and the name of the corresponding malware family. If available, the records also include the source port, target IP, target port and target hostname for the connection.

A value of 'generic' for the malware family means:
a) The affected system connected to a domain name related to the Avalanche botnet infrastructure which could not be mapped to a particular malware family yet.

or

b) The HTTP request sent by the affected system did not include a domain name. Thus, on the sinkhole it could not be decided which domain name the affected system resolved to connect to the respective IP address.

Most of the malware families reported here include functions for identity theft (harvesting of usernames and passwords) and/or online-banking fraud."

Please see the attached file for a list of associated IP addresses - Time Zone reflects UTC+1

The owner/operator of this IP may or may not be aware this host is performing this activity or that it has been possibly compromised. If your investigation confirms this activity, the NCCIC would greatly appreciate your assistance in suspending this host until corrective measures are taken.

The NCCIC incident number above has been assigned for future reference. Please refer to this number in the subject line of any email correspondences to ensure proper tracking. We greatly appreciate your assistance in resolving this matter and look forward to your continued cooperation.

If you need assistance in this matter or have any questions please contact the NCCIC Service Desk at soc@us-cert.gov. You are neither required nor expected to provide further updates in regards to situational awareness. Contact information associated with the malicious IP was retrieved via ARIN. If you would like to have your contact information updated then please contact ARIN: https://www.arin.net/contact_us.html

To submit samples of malicious code for analysis, visit http://malware.us-cert.gov. Our information sharing portal for trusted partners is available at https://portal.us-cert.gov.

Respectfully,

National Cybersecurity & Communications Integration Center (NCCIC)
Department of Homeland Security
SOC@us-cert.gov
www.us-cert.gov
Twitter: @USCERT_gov

This message and attachments may contain confidential information. If it appears that this message was sent to you by mistake, any retention, dissemination, distribution or copying of this message and attachments is strictly prohibited. Please notify the sender immediately and permanently delete the message and any attachments.

CSV content follows:

cc,ip,Abuse Contact,SubNet,timestamp,malware,src_port,dst_ip,dst_port,dst_host
US,208.80.155.236,abuse@wikimedia.org,208.080.152.000/22,1/13/2018 19:43,nymaim,34492,184.105.192.2,80,www.tvbsp.com

Hmm... IABot does emulate a Chrome UA to avoid bot blocks, but maybe a level of transparency to explain the bot may be better. With that being said, IABot doesn't ping a specific URL within one week of the last ping, or 3 days if the site appears to be dead. So there is always at least a week of waiting before IABot pings the URL again.

IABot doesn't ping a specific URL within one week of the last ping

I believe the issue is that it's hitting consecutive urls that are on the same server. So like

www.example.org/pageone
www.example.org/pagetwo
www.example.org/pagethree
www.example.org/pagefour
www.example.org/pagefive

So if we interleaved requests based on fqdn we'd be less likely to trip this kind of alert.

The problem the bot only actually scans these links when it's on an article. So if an article has 50 links going to example.org then all of them are scanned, and what ever is seen as dead afterwards, gets then appropriately updated on the page. The scanning and page crawling are not two separate processes.

Hmm... IABot does emulate a Chrome UA to avoid bot blocks, but maybe a level of transparency to explain the bot may be better.

This is the part of this thread that concerns me the most at this point. I wouldn't blame them for blocking any IP that seems to do this. Please don't impersonate a UA. There is nothing actionable written at this time but at some point I think we should have a policy that indicates this is worthy of a ban.

We have had a few reports for bitninja over time and usually they are poor effort, low quality, notices that they never followup on. T136829: 'German Wikipedia Broken Weblinks Bot' is ill-behaved and in danger of getting all of Labs blacklisted comes to mind. Bitninja is not the only external actor at play here so that doesn't tell the entire story though as https://phabricator.wikimedia.org/T185383#3914849 is possibly a problem. That brings to mind https://phabricator.wikimedia.org/T156074#2989810. Not mentioned there is to never impersonate a user a falsify a UA.

> We have had a few reports for bitninja over time and usually they are poor effort, low quality,

notices that they never followup on.

Note that just today I finally got a response from bitninja acknowledging my pleading and promising to whitelist our IP range. So although iabot's behavior remains in need of fixing, bitninja should stop obsessing about it.

They aren't the only people flagging such behavior, of course.

So the easiest and quickest solution is to simply make the UA more transparent, which is an easy fix.

So the easiest and quickest solution is to simply make the UA more transparent, which is an easy fix.

Yeah, I'd recommend including the full name (InternetArchiveBot), the relation to Wikipedia (so that they won't think it's on behalf of archive.org itself), and a URL for more information and contact, e.g. to https://meta.wikimedia.org/wiki/InternetArchiveBot.

The first part to addressing this is now complete. A pull request is open to allow for custom UAs to be passed to the algorithm. It also has a queuing mechanism to queue up links going to the same domain to be tested in sequence with 1 second delays.

Part 2 will be implemented in v1.6.3 of IABot.

The UA will point to https://meta.wikimedia.org/wiki/InternetArchiveBot/FAQ_for_sysadmins

though, I think we should have a better method of contacting me than using a talk page since most of them will likely not know how to use one. I'm thinking maybe a dedicated OTRS queue.