Page MenuHomePhabricator

Identify the source of WHOIS data, the retrieval method, and update frequency
Closed, DeclinedPublic

Description

Once we have the codes and DB schema for storing IP WHOIS information (see subtask T174553), it is time for WMF to decide where to obtain this data, when to obtain this data (on demand, or in batch) and how often to update the data.

Sources mentioned so far are MaxMind (http://maxmind.github.io/GeoIP2-php/) and IPInfo (http://ipinfo.io/about). It has been mentioned that the MaxMind data may be in accurate in that it assumes all IPv4's have a /24 subnet (which is often wrong); on the other hand, IPInfo seems not to allow "batch" downloads (it only allows on demand queries, IIRC).

Another issue to discuss is pricing and licensing. If a paid data source is eventually selected as the most appropriate solution, WMF has to agree to pay for it.

Event Timeline

Hi,

Is this meant to be a database schema change? https://wikitech.wikimedia.org/wiki/Schema_changes
If it is: can you please follow this template for the task, so DBAs can better understand it and thus try to get it resolved faster and without any issues?: https://wikitech.wikimedia.org/wiki/Schema_changes#Workflow_of_a_schema_change

For IP subnet information, we could parse the data directly from RIRs. The data is available for free (in both senses of the word) at the following URLs:

However, I do not know how accurate the data is and how often it is updated. For instance, looking for "|8." in the ARIN data shows this row only:

arin|US|ipv4|8.0.0.0|16777216|19921201|allocated|abc43e5fcdb68e284085cf6f1b34834e

This indicates that a very broad range from 8.0.0.0 to 8.255.255.255 belongs to one subnet registered in 1922-12-01. We know from sources like http://ipinfo.io/8.8.8.0 and http://ipinfo.io/8.8.9.0 that it is "incorrect", but maybe the sub-division of this broad range has been done by lower-level organizations, and ARIN is unaware of it?

@Marostegui I am removing the Schema Change tag.

Thanks :-)

I think this discussion is a bit premature. Lets wait until we have a team confirmed working on this extension before figuring out the implementation details.

This indicates that a very broad range from 8.0.0.0 to 8.255.255.255 belongs to one subnet registered in 1922-12-01. We know from sources like http://ipinfo.io/8.8.8.0 and http://ipinfo.io/8.8.9.0 that it is "incorrect", but maybe the sub-division of this broad range has been done by lower-level organizations, and ARIN is unaware of it?

That's because 8.0.0.0/8 is allocated to Level3. You can check that in your favorite looking glass (8.0.0.0/8 http://lg.ring.nlnog.net/prefix_detail/lg01/ipv4?q=8.0.0.0/8 ). I'll comment on T152114 for a different approach

Huji changed the task status from Open to Stalled.Sep 6 2017, 4:32 PM

I think this discussion is a bit premature. Lets wait until we have a team confirmed working on this extension before figuring out the implementation details.

Agreed; this is "for later". Changing task status to Stalled, hoping that it means "for later" :)

We're going to decline this for now. Please reopen once you have something more concrete for us to review