Page MenuHomePhabricator

Investigate: Adding range and CIDR data to IPInfo
Open, Needs TriagePublicSpike

Description

Please look into how IPInfo could get and show IP range and CIDR data.

An example was given from the feedback received and may be helpful in investigating how to integrate such functionality:
https://whois-referral.toolforge.org/gateway.py?lookup=true&ip=185.225.28.154

Look into if we can find this data using MaxMind and if not, what are potential other options for getting this data. (@STran might have more context here.)

Event Timeline

Restricted Application changed the subtype of this task from "Task" to "Spike". · View Herald TranscriptAug 30 2022, 4:15 PM

Maxmind provides a conversion tool, its limitation however is that it works off csv files, converts from csv and its output is also in csv.
From testing this tool, I was able to get ranges given a network. e.g given 1.0.8.0/21 the out put was 1.0.8.0-1.0.15.255 for range.
since we have this data by using the trait example using of getting network rat the end of this example.
We could create our own utility function that calculates the range given ip that is represented in cidr format.

@Prtksxna @STran @TThoabala am I right in thinking that we want to do more than just convert an address in CIDR notation to the range of IP addresses that it represents?

Do we want to look up the assigned block to which an IP address belongs?

If the difference isn't clear, here's my understanding:

  • 185.225.28.154 (example from the task description) is an IP address
  • 2 network routing blocks were found for this IP address:
    • 185.225.28.144/28 (in CIDR notation) which is the range 185.225.28.144 - 185.225.28.159
    • 185.225.28.0/24 (in CIDR notation) which is the range 185.225.28.0 - 185.225.28.255
    • (The first network routing block looks more interesting, since there are more details about it, but for some reason the second one is the one references by the key asn_cidr... Not sure why...)

We could create our own utility function that calculates the range given ip that is represented in cidr format.

MediaWiki has this utility:

>>> IPUtils::formatHex(IPUtils::parseRange('185.225.28.154/24')[0]);
=> "185.225.28.0"

>>> IPUtils::formatHex(IPUtils::parseRange('185.225.28.154/24')[1]);
=> "185.225.28.255"

Do we want to look up the assigned block to which an IP address belongs?

I think yes.

However doesn't seem like max mind can help us with this.
I ran

use MaxMind\Db\Reader;
$reader = new Reader('/var/www/html/w/enterprise/GeoIP2-Enterprise.mmdb');
print_r($reader->getWithPrefixLen('185.225.28.154')) //returns 24

returns 24 which gives out the range 185.225.28.0 - 185.225.28.255 as mentioned above.

@Prtksxna is this what the user was looking for or something like what @Tchanders posted?

Thanks for sharing example output @TThoabala @Tchanders!

am I right in thinking that we want to do more than just convert an address in CIDR notation to the range of IP addresses that it represents?
Do we want to look up the assigned block to which an IP address belongs?

In my understanding, yes. Based on the example of WhoIs tool, I think the data we're looking for is asn_cidr: 185.225.28.0/24, cidr: 185.225.28.0/24 or range: 185.225.28.0 - 185.225.28.255.

2 network routing blocks were found for this IP address

(The first network routing block looks more interesting, since there are more details about it, but for some reason the second one is the one references by the key asn_cidr... Not sure why...)

I see two of these ranges in the WhoIs tool too. Not sure what the difference is. Is one for the VPN the IP is on, and the other for the ISP? Are either of these available to us on MaxMind?


Noting some of the user feedback from Meta that asked for this information.

Networks name, description, cidr, and range (whois example)

Others have mentioned it but I really want to see asn_cidr information from whois as well. Pretty useful when assessing range blocks

CIDR of related ASN. This could be useful when determing an IP range block.

Providing the CIDR range (based on whois or other data sources) to which the queried IP address belongs can be helpful in checking for vandals trying to escape tracking. Some vandals use one IP address for destructive editing and are discovered, and may reconnect to the Internet access to replace another IP address, but generally, the IP address provided by the ISP to the customer is in the same address pool (or in other words, the same specific CIDR address segment) in a short period of time. Even if a vandal clears the browser's cookies to erase the identification of IP users, the association between IP addresses is not easy to remove.


I ran

use MaxMind\Db\Reader;
$reader = new Reader('/var/www/html/w/enterprise/GeoIP2-Enterprise.mmdb');
print_r($reader->getWithPrefixLen('185.225.28.154')) //returns 24

returns 24 which gives out the range 185.225.28.0 - 185.225.28.255 as mentioned above.

@Prtksxna is this what the user was looking for or something like what @Tchanders posted?

[...] Based on the example of WhoIs tool, I think the data we're looking for is asn_cidr: 185.225.28.0/24, cidr: 185.225.28.0/24 or range: 185.225.28.0 - 185.225.28.255.

From these two comments, it sounds like we're able to get the required data from MaxMind. @TThoabala @Prtksxna - am I understanding that correctly?

From these two comments, it sounds like we're able to get the required data from MaxMind. @TThoabala @Prtksxna - am I understanding that correctly?

at @Tchanders I think so. is it worth putting more examples here perhaps? I will put a few, but from @Prtksxna comment seems like we have the data we need from Maxmind.

@STran helped me understand this a bit better, they also pointed me to this table https://www.mediawiki.org/wiki/Help:Range_blocks#Table_of_sample_ranges.

From what I know now, the /24 isn't interesting or useful information, its the full range of the last octet going from 0 to 255. What we need is the /28, the range for the Mullvad VPN in this case, which we aren't getting from MaxMind.

Moving this back to in progress to see if we can find other sources that can help us

Thanks @Prtksxna.

what you are looking into and why
We are trying to see if we could get and show IP range and CIDR data in IPInfo.
This comment summarises what is required and mostly why. https://phabricator.wikimedia.org/T316193#8299917

CIDR (Classless Inter-Domain Routing) is a method of assigning Internet Protocol (IP) addresses that improves the efficiency of address distribution and replaces the previous system based on Class A, Class B and Class C networks.

We first tried to see if maxmind will be able to provide us with CIDR data.
We ran

use MaxMind\Db\Reader;
$reader = new Reader('/var/www/html/w/enterprise/GeoIP2-Enterprise.mmdb');
print_r($reader->getWithPrefixLen('185.225.28.154')) //returns 24

which returns 24 as seen above converting this to a range will give us 185.225.28.0 - 185.225.28.255
Tried this with different IP addresses and in the end found that maxmind only returns the entire range of IP address block which is not what we are looking for.

Some of the websites we tried to get this data from were https://www.bigdatacloud.com/network-lookup/, looked into this one as it had API and some documentation, but after trying this out it turns out it returns same data as maxmind(entire block range for given IP address).

We then tried using the tool listed on the ticket description https://whois-referral.toolforge.org/, with an example as below

https://whois-referral.toolforge.org/gateway.py?ip=185.225.28.154&lookup=true&format=json
and it returned
`{

"nir": null,
"asn_registry": "ripencc",
"asn": "205119",
"asn_cidr": "185.225.28.0/24",
"asn_country_code": "MK",
"asn_date": "2017-10-12",
"asn_description": "TELEKS-, MK",
"query": "185.225.28.154",
"nets": [
  {
    "cidr": "185.225.28.144/28",
    "name": "Mullvad_VPN_AB",
    "handle": null,
    "range": "185.225.28.144 - 185.225.28.159",
    "description": null,
    "country": "MK",
    "state": null,
    "city": null,
    "address": null,
    "postal_code": null,
    "emails": null,
    "created": "2021-08-13T06:33:57Z",
    "updated": "2021-08-13T06:34:49Z"
  },
  {
    "cidr": "185.225.28.0/24",
    "name": null,
    "handle": null,
    "range": "185.225.28.0 - 185.225.28.255",
    "description": null,
    "country": null,
    "state": null,
    "city": null,
    "address": null,
    "postal_code": null,
    "emails": null,
    "created": "2017-11-20T13:36:59Z",
    "updated": "2017-11-20T13:36:59Z"
  }
],
"referral": null,
"geolite2": "North Macedonia",
"geo_ipinfo": "Skopje, Grad Skopje, MK"

}`
(this is same results from above just in a different format).

Tested with a couple of other IP addresses, and it doesn't always return the data we are looking for but seems better than what we are getting from maxmind(or other tools we testes).
We went further to look into what the tool was doing as its source code in available here
which also showed where the tool is getting its data from.

Since the tool is in beta I am not sure if it can be used at this point.
One approach would be to use the same sources as listed here, and try implement this in IPInfo.
We however need to test this with enough IP address and decide if those data sources are sufficient.

Thanks for looking into this! I think if we could answer a few more questions, we could come to some kind of next step:

  • Even if the tool is in beta, how would you propose using it? Would it be better to implement querying the providers ourselves? As a follow-up, we usually have some concerns about querying third-party providers and privacy. Can we bulk download a list somewhere? I skimmed https://whois.arin.net/ and it has a usage policy. Presumably all of these services have one. As part of due diligence, we might want to make sure we conform to those and are okay with their terms before integrating anything.
  • How many IP addresses would you want to test to see if whatever data source you picked would be sufficient? What metrics would you use to determine if a data source is sufficient? fwiw these look like authoritative providers but I don't remember if CIDR is something that's explicitly assigned by a provider.
  • Related to above, how low of a fidelity can we tolerate before CIDR becomes noise? ie. at what point do you think the user would be unable to consistently trust the CIDR data we're showing? Is any data better than no data at all even if it's inconsistent?
  • Are there other sources we need to look at? Could look at? Is this something that's definitely assigned and therefore fixed or are there services that may have varying qualities of dbs?

The sources used by https://whois-referral.toolforge.org/gateway.py are regional internet regional internet registries and will hold the source of truth for IPs.

I have gone through them and I could not find a list that can be bulk downloaded. and in some it's not clear if our usage is covered by their terms of use.
@Niharika could you help with this please?

It seems like the way to go with this is to use regional internet registries. We will need some ticket(s) to come with a strategy to test for data sufficiency with different IP addresses.

@Niharika Should we move this into stalled, in light of our prioritisation of other IPInfo work?

Aklapper added a subscriber: TThoabala.

Removing inactive task assignee (please do so as part of offboarding steps).