Page MenuHomePhabricator

Create a tool that makes it easier to map IPs to ASN CIDRs
Open, Needs TriagePublic

Description

We need a tool that can take a list of IP addresses as an input (one per line), and return a condensed list of CIDRs for the ASNs that those IPs belong to.

This is something I have do populate manually whenever I run a CU on a user that uses one of those ISPs that cycles through IP addresses quickly, or when the user is using proxies (either because they have IPBE, or because we have yet to block those proxies). Doing it manually is inefficient. I once wrote a JS tool to make the process easier, but @Ladsgroup pointed out that sending many IP WHOIS requests from my machine may not be a great idea. Also, much of the IP-to-ASN relationship does not really change much over time and could be cached, or even better, could be retrieved from sources like MaxMind instead of running actual WHOIS queries.

The purpose of this task:

  • Identify what data source to use
    • Does MaxMind free provide all that is needed?
    • Does MaxMind free license allow this kind of use case?
    • If the answer to either of those is no, can we use a licensed version of MaxMind through WMF?
    • If no, are there any reasons we could not run WHOIS commands from Clouds machines and cache them in a database?
  • Identify if setting up a service like this is allowed on WMF Clouds
  • Determine how access management should be done for such a tool
    • Is it possible to use OAuth to verify the user is a CU on some wiki?
    • If not, should we authenticate using OAuth and but perform access management using a manual list of allowed users?
  • Other technical considerations
    • Should the tool have some form of throttling?
    • Should it have its own audit log? If yes, what should it entail?

Event Timeline

Huji updated the task description. (Show Details)

If no, are there any reasons we could not run WHOIS commands from Clouds machines and cache them in a database?

Such a cache service, if generalized, might help https://whois.toolforge.org/ which essentially sends IP Whois requests on behalf of users. It caused issues from sending too many requests against Whois databases not too long ago: T265784

I read through the terms of use of the various Regional Internet Registries. Links are provided below for reference.

ARIN's terms of use did not seem to impose any limitations to downloading and keeping the data. There are clauses that say you cannot reuse the data except for the same reasons its original use was permitted for (so, e.g., you cannot download and then use it for advertising). But I don't think that would be applicable to the use case proposed above.

APNIC says their data may not be reproduced or stored in a retrieval system "except for Internet operational purposes"; while I am no lawyer, I think our use case above meets this criteria so we could cache their data upon retrieval.

RIPE's terms of use states that "A User may not re-package, download, compile, re-distribute or re-use any or all of the RIPE Database or the data contained therein unless he does so only with an insubstantial part of the RIPE Database or the data contained therein or when permission to do so is granted by the RIPE NCC". So I guess if you only cache the few IPs that your tool fetched and purge the cache after a few days, you will never keep more than an insubstantial part of RIPE data and would not be in violation of this term (but I am no lawyer).

AFRINIC uses a more restrictive language and says "... the data contained herein ... are and shall remain the property of AFRINIC ... [and] may not be used, reproduced or made available to third parties without the prior written authorisation from AFRINIC". This would mean that companies like MaxMind have prior authorization from the likes of AFRINIC and RIPE. Maybe WMF should get this authorization too?

And for LACNIC, I could not find anything in their terms of use that would be relevant to the copyright/caching question.

So at this point, I would not feel comfortable caching IP WHOIS data. I think the right answer is to get access to MaxMind or a similar third-party IP WHOIS database, through WMF.


@dbarratt based on your comment on T264838#6534393 I think you may be able to offer advice here too.