Page MenuHomePhabricator

Toolforge & Cloud VPS egress IP has been blocked by RIPE whois database for excessive use
Closed, ResolvedPublic

Description

From a toolforge webservice shell console:

>>> from ipwhois import net
>>> net.Net("145.64.254.243").get_whois('ripencc')
'% This is the RIPE Database query service.
% The objects are in RPSL format.
%
% The RIPE Database is subject to Terms and Conditions.
% See http://www.ripe.net/db/support/db-terms-conditions.pdf

%ERROR:201: access denied for 185.15.56.1
%
% Sorry, access from your host has been permanently
% denied because of a repeated excessive querying.
% For more information, see
% http://www.ripe.net/data-tools/db/faq/faq-db/why-did-you-receive-the-error-201-access-denied

% This query was served by the RIPE Database Query Service version 1.98 (HEREFORD)'

They have some pages which may be relevant:
https://www.ripe.net/manage-ips-and-asns/db/support/documentation/ripe-database-acceptable-use-policy/why-did-i-receive-an-error-201-access-denied
https://www.ripe.net/manage-ips-and-asns/db/support/documentation/ripe-database-acceptable-use-policy

I do not know specifically what the reason is for this block. If it relates to "personal data sets", the site says that using the -r flag will prevent the contact information from being returned. We do not use the command line interface (it is not even available), and it would be necessary to patch the python "ipwhois" library in order to pass that flag.

As a result of this block, administrators and checkusers receive degraded or no information when they attempt to look up information related to an IP address in the RIPE region. This impacts our ability to understand how a specific IP address is being used (commercial vs residential vs hosting provider) as well as our ability to calculate appropriate IP ranges for checkuser and for blocking. Specifically, they will only see the IP range allocated to RIPE, and not the range allocated by RIPE to the customer, which is likely to result in blocks and CU being performed at the /16 rather than at a more appropriate size.

Originally reported at https://github.com/whym/whois-gateway/issues/18

Event Timeline

  • I'm wondering if the whois tool at toolforge should have some mechanism of throttling, if it was the culprit.
  • Was there a large amount of automated access to RIPE via the the whois tool? I cannot verify because as a tool maintainer I don't seem to be able to see recent access logs.
  • I'm wondering if the whois tool at toolforge should have some mechanism of throttling, if it was the culprit.
  • Was there a large amount of automated access to RIPE via the the whois tool? I cannot verify because as a tool maintainer I don't seem to be able to see recent access logs.

https://toolviews.toolforge.org/api/v1/tool/whois/daily/2020-07-01/2020-10-18 does not show much activity at all for the whois tool, although those numbers only count HTTP 2xx results, so there could have been a large number of queries that returned a failing status to the caller.

185.15.56.1 is nat.openstack.eqiad1.wikimediacloud.org. That address is the shared egress IP for all traffic leaving Cloud VPS for the external internet.

At https://www.ripe.net/manage-ips-and-asns/db/support/documentation/ripe-database-acceptable-use-policy/why-did-i-receive-an-error-201-access-denied there is a pointer to the RIPE acceptable use policy (pdf) which has some info on the limits:

  • Number of personal data sets returned in queries from an IP address – 1,000 per 24 hours
  • Number of personal data sets returned in queries from a proxy IP address – 20,000 per 24 hours

The FAQ suggests that ripe-dbm(at)ripe.net can be contacted by someone on the cloud-services-team to potentially get our shared proxy IP treated as a proxy address by RIPE.

bd808 renamed this task from Toolforge IP has been blocked by RIPE whois database to Toolforge & Cloud VPS egress IP has been blocked by RIPE whois database for excessive use.Oct 19 2020, 3:58 PM
bd808 edited projects, added Cloud-VPS, Toolforge; removed Cloud-Services.

I received the following from RIPE technical support:

The IP that you mentioned has been automatically marked for
permanent query denial due to excessive querying activity
originating from it, even after it has been repeatedly
temporarily denied.

There is a limit of objects that can be retrieved for a given
time period. This limit is ONLY in effect for person or role
data. This is because of the privacy restrictions on said data -
it contains information such as e-mail and phone contact that may
be sensitive.

In order to query the database without viewing such data, please
use the "nonrecursive" option to the query. This will return
only the actual records, rather than also returning the contact
information.

This is specified with the "-r" flag.

So, it seems like tools querying RIPE must specify the "-r" flag. I wrote a patch to the python ipwhois library, which you can review on GitHub. With this patch, you can specify get_recursive=False:

>>> IPWhois("145.64.254.243").lookup_whois(inc_raw=True, get_recursive=False)

This removes the "recursive" lookups from the query response. It means we no longer get the "organization", "role", or "person" details. Here is an example of the raw response:

% This is the RIPE Database query service.
% The objects are in RPSL format.
%
% The RIPE Database is subject to Terms and Conditions.
% See http://www.ripe.net/db/support/db-terms-conditions.pdf

% Note: this output has been filtered.
%       To receive output for a database update, use the "-B" flag.

% Information related to \'145.64.0.0 - 145.64.255.255\'

% Abuse contact for \'145.64.0.0 - 145.64.255.255\' is \'wherler@epo.org\'

inetnum:        145.64.0.0 - 145.64.255.255
netname:        EPONET
org:            ORG-EPO5-RIPE
descr:          European Patent Office
descr:          P.O. Box 5818
descr:          2280 HV  Rijswijk
country:        NL
admin-c:        NdR9-RIPE
admin-c:        ENA7-RIPE
tech-c:         ENS6-RIPE
status:         LEGACY
mnt-by:         EPO-MNT
mnt-by:         RIPE-NCC-LEGACY-MNT
created:        1970-01-01T00:00:00Z
last-modified:  2020-08-28T09:51:50Z
source:         RIPE

% Information related to \'145.64.254.0/24AS28756\'

route:          145.64.254.0/24
descr:          European Patent Office
origin:         AS28756
mnt-by:         EPO-MNT
created:        2019-06-26T13:23:13Z
last-modified:  2019-06-26T13:23:13Z
source:         RIPE

% This query was served by the RIPE Database Query Service version 1.98 (BLAARKOP)

However, it doesn't significantly affect the data that we actually use. The main impact seems to be the loss of the ['nets'][0]['address'] and ['nets'][0]['emails'] keys, which frankly isn't a big deal:

>>> from ipwhois import IPWhois
>>> IPWhois("145.64.254.243").lookup_whois()
{'nir': None, 'asn_registry': 'ripencc', 'asn': '28756', 'asn_cidr': '145.64.252.0/22', 'asn_country_code': 'DE', 'asn_date': '1993-09-01', 'asn_description': 'EPO-AS, NL', 'query': '145.64.254.243', 'nets': [{'cidr': '145.64.0.0/16', 'name': 'EPONET', 'handle': 'ENA7-RIPE', 'range': '145.64.0.0 - 145.64.255.255', 'description': 'European Patent Office\nP.O. Box 5818\n2280 HV  Rijswijk', 'country': 'NL', 'state': None, 'city': None, 'address': 'Patentlaan 2\n2288EE\nRijswijk\nNETHERLANDS', 'postal_code': None, 'emails': ['admin_network@epo.org'], 'created': '1970-01-01T00:00:00Z', 'updated': '2020-08-28T09:51:50Z'}, {'cidr': '145.64.254.0/24', 'name': None, 'handle': None, 'range': '145.64.254.0 - 145.64.254.255', 'description': 'European Patent Office', 'country': None, 'state': None, 'city': None, 'address': None, 'postal_code': None, 'emails': None, 'created': '2019-06-26T13:23:13Z', 'updated': '2019-06-26T13:23:13Z'}], 'referral': None, 'raw_referral': None}
>>> IPWhois("145.64.254.243").lookup_whois(get_recursive=False)
{'nir': None, 'asn_registry': 'ripencc', 'asn': '28756', 'asn_cidr': '145.64.252.0/22', 'asn_country_code': 'DE', 'asn_date': '1993-09-01', 'asn_description': 'EPO-AS, NL', 'query': '145.64.254.243', 'nets': [{'cidr': '145.64.0.0/16', 'name': 'EPONET', 'handle': None, 'range': '145.64.0.0 - 145.64.255.255', 'description': 'European Patent Office\nP.O. Box 5818\n2280 HV  Rijswijk', 'country': 'NL', 'state': None, 'city': None, 'address': None, 'postal_code': None, 'emails': None, 'created': '1970-01-01T00:00:00Z', 'updated': '2020-08-28T09:51:50Z'}, {'cidr': '145.64.254.0/24', 'name': None, 'handle': None, 'range': '145.64.254.0 - 145.64.254.255', 'description': 'European Patent Office', 'country': None, 'state': None, 'city': None, 'address': None, 'postal_code': None, 'emails': None, 'created': '2019-06-26T13:23:13Z', 'updated': '2019-06-26T13:23:13Z'}], 'referral': None, 'raw_referral': None}
>>>

You can install my fork with:

pip install -e git+git://github.com/wiki-ST47/ipwhois.git@fae007310c5c73ac48e827de555bfce9d9c418ea#egg=ipwhois

And add the get_recursive=False argument to your lookup_whois() calls, for now. I'll submit a pull request to ipwhois for a more permanent solution. In the mean time, some other projects that might use whois info include my own as-info, and SQL's isprangefinder. I'll have a look.

I have updated as-info and whois-referral. isprangefinder uses the whois tool's json output, it doesn't make whois queries directly.

As far as I know, that leaves only the whois tool itself.

@bd808, is that data on toolviews.toolforge.org delayed or downsampled at all? My usage alone of whois-referral.toolforge.org exceeds what I'm seeing at https://toolviews.toolforge.org/api/v1/tool/whois-referral/daily/2020-07-01/2020-10-19

@bd808, is that data on toolviews.toolforge.org delayed or downsampled at all? My usage alone of whois-referral.toolforge.org exceeds what I'm seeing at https://toolviews.toolforge.org/api/v1/tool/whois-referral/daily/2020-07-01/2020-10-19

The database behind the toolviews tool should be updated each time the nginx front proxy access logs are rotated. That happens somewhere between 1 and 4 times a day depending on the total request volume sent to *.toolforge.org. But, I will admit that nobody is really watching the script that parses the nginx log after each rotation, so it may be miscounting things. Just eyeballing the result from https://toolviews.toolforge.org/api/v1/day/2020-10-01 I think total numbers look reasonable, but the fourohfour tool may be getting more than its rightful share of hits. The parsing script adds to the count for the fourohfour tool when it thinks the request was for an unknown tool, so this could indicate something going wrong with the data collection. I will start a new task to look into that.

nskaggs triaged this task as High priority.Oct 20 2020, 4:20 PM

I heard back again from RIPE and the IP has been unblocked. Asking them to treat it as a proxy IP would still be desirable, in order to reduce the chance of this happening again.

Additionally, @whym's whois.toolforge.org tool should be updated to use the nonrecursive -r flag, at least when querying RIPE. isprangefinder.toolforge.org queries whois.toolforge.org, and that can lead to a ton of queries for a single web request.

Wzj88123 claimed this task.
ST47 removed ST47 as the assignee of this task.

There are two ongoing actions here:

  • nskaggs seems to be planning to contact RIPE on behalf of the cloud services team
  • the whois tool needs to be updated

Reopened

Thank you for contacting RIPE ST47. I've also reached out to RIPE and asked them to consider our use as a proxy and up the limits to 20k per 24 hours per their terms. I will update the ticket once I have a response.

I tried using ST47's version of ipwhois on a test instance, while not changing the original tool yet. However, it looks like currently the block is lifted? I see no difference in the results - both work.

https://whois-dev.toolforge.org/w/145.64.254.243/lookup
https://whois.toolforge.org/w/145.64.254.243/lookup

(The whois-dev instance incorporates some unrelated changes, too.)

EDIT: Sorry, I forgot to add get_recursive=False, and with that option enabled, the email field gets hidden, confirming ST47's explanation above. Still, the block does not seem enforced for now.

The block was lifted, mentioned a few comments above. However the rate limit is still in place, so we may be blocked again if we exceed 1000 queries to RIPE in 1 day.

Additionally, @whym's whois.toolforge.org tool should be updated to use the nonrecursive -r flag, at least when querying RIPE. isprangefinder.toolforge.org queries whois.toolforge.org, and that can lead to a ton of queries for a single web request.

I'm trying to see if how the change will affect information retrieved from other databases like APNIC. Contact information may not be needed most of the time, but there is a legitimate need for it (mostly email addresses) when a sysop wants to report an abuser. I don't know how much of contact information is recursively retrieved, though.

The best I have come up with so far is:

  • Provide more visible links for additional information available at the database's website. We'd want to identify and highlight which part of the information that is, somehow.
  • Make a non-recursive lookup first, and if it wasn't RIPE, make a recursive lookup next. That would mean the tool makes 2 lookups to other databases every time, which is not great.

However, I guess I'll have make the quickest change (simply disabling all recursive lookups), before trying fine-tuning like that. As you said, 1,000 per day is not a high threshold with the tool as it is.

I don't believe the -r flag has any effect on other RIRs. My patch to ipwhois only uses the -r flag when querying RIPE's servers.

Ah, that's right. Then there is no reason to withhold it, at least as a short term fix. I'll think about things like highlighting missing information later. Thank you for your help.

The block was lifted, mentioned a few comments above. However the rate limit is still in place, so we may be blocked again if we exceed 1000 queries to RIPE in 1 day.

That's correct. I also reached out to RIPE and unfortunately they can't increase the limits for the shared IP in this instance. So we have to abide by this limit.

whois.toolforge.org is now using ST47's patch. (Thanks again.) @ST47, are you going to send a pull request and get it merged to the upstream repository? That would be better for long-term maintainability.

I tried to obtain more insight into how the tool is used from access logs. From a 5-6 days' worth of logs, ~80k lines contain 'lookup', ~70k contain 'lookup' and 'json' (roughly indicating automated access), ~19k contain 'lookup' and 'bot' (roughly indicating web crawlers). I'm not sure what to make of it, but at least it confirms that there are lots of automated accesses that I can identify as such, and I could probably do better at rejecting web crawlers.

(It turned out the reason I didn't get to see the logs last week was because I had disabled it after I finished debugging other issues previously. I disabled it again.)