Page MenuHomePhabricator

Add "Number of users on this IP" to the API for pulling into the IP Information box
Open, MediumPublic2 Estimated Story Points

Description

Goal

This task is to add the "Number of users on this IP" (userCount) field to the IP Information box.

image.png (648×1 px, 129 KB)

Event Timeline

Niharika created this task.
Niharika renamed this task from Add "Number of users on this IP" to the IP Information box to Add "Number of users on this IP" to the API for pulling into the IP Information box.Dec 9 2020, 4:07 PM
STran subscribed.

Not sure if this is a question, a concern, or a point of interest but maxmind describes user count as users on that IP in the last 24 hours. Since we're not using their web service but instead their dbs and we update dbs once a week, that suggests our data can be roughly a week out of date. Do we need to add an info tooltip to denote that? Does this (hypothetically) change the usefulness of this data point?

Change 649493 had a related patch set uploaded (by STran; owner: STran):
[mediawiki/extensions/IPInfo@master] Add connection type to the IPinfo API call

https://gerrit.wikimedia.org/r/649493

This patch set is technically ready for review because it does what I want it to do but I have a list of caveats about what I wanted it to do:

In the event that we get an Insight.mmdb-esque file to use for the reader, the method to read the file probably changes to insights($ip) but the rest should still be valid.

Based on what we talked about today, I'm moving this ticket back to ready for development until we know more.

copy pasta-ing from gerrit for ease of access:
Pulling this patch off the line for now - we talked more about how we thought this data was going to be available to us and

  1. it might not be
  2. it looks to only be available via web services (related: see alternative concern if this is not the case T269764#6691086)

Given the number of unknowns around this (especially since a web services implementation would be different than the reader implementation), there doesn't seem to be an advantage to pushing this out.

Change 649493 abandoned by STran:
[mediawiki/extensions/IPInfo@master] Add connection type to the IPinfo API call

Reason:

https://gerrit.wikimedia.org/r/649493

phuedx subscribed.

I'm being bold and unassigning this per T269764#6693062 and T269764#6693105.

Following on from yesterday's AHT: Estimation & Planning meeting:

The (in the last 24 hours) is likely going to be incorrect. Like the other information in the mock, we get the number of users of an IP from a local copy of the MaxMind database and not their Precision Insights API. Currently, those local copies are updated weekly and when we migrate MediaWiki to Kubernetes then they could be updated as regularly as daily but as slowly as once every 10 days (per T288844#7296775).

As a best effort, we could determine when the database was last updated (by stating it?) and include it into the "This information could lack accuracy" disclaimer message that we show.

From the engineering meeting today:

It's not clear if we even have access to this at all. I checked against the mmdb files for a random IP and received this (values redacted) information back:

$ ./Downloads/mmdbinspect_0.1.1_darwin_amd64/mmdbinspect --db ./Downloads/GeoIP2-Enterprise_20210921/GeoIP2-Enterprise.mmdb  <ip>
[
    {
        "Database": "./GeoIP2-Enterprise.mmdb",
        "Records": [
            {
                "Network": "<ip>",
                "Record": {
                    "city": {},
                    "continent":  {},
                    "country":  {},
                    "location":  {},
                    "postal":  {},
                    "registered_country":  {},
                    "subdivisions": [],
                    "traits": {
                        "autonomous_system_number": #,
                        "autonomous_system_organization": "",
                        "connection_type": "Cable/DSL",
                        "domain": "",
                        "isp": "",
                        "organization": "",
                        "user_type": "residential"
                    }
                }
            }
        ],
        "Lookup": "<ip>"
    }
]
$ ./GeoIP2-Anonymous-IP.mmdb  <ip>
[
    {
        "Database": "./Downloads/GeoIP2-Anonymous-IP_20210921/GeoIP2-Anonymous-IP.mmdb",
        "Records": [
            {
                "Network": "<ip>",
                "Record": {
                    "is_anonymous": true,
                    "is_public_proxy": true
                }
            }
        ],
        "Lookup": "<ip>"
    }
]

As a disclaimer, I only looked up on IP but from even this single data point, it shows that we're not guaranteed to have this information. We might need to decide if this number is valuable to actively get beyond what we already have or if we can drop this information. @Prtksxna @Niharika

Additionally, I checked MaxMind's site again (https://www.maxmind.com/en/geoip2-services-and-databases) and I think it confirms what I'm saying. We seem to have paid for the Enterprise and some of these data points we're interested in are only available from Insights:

image.png (1×1 px, 237 KB)

Additionally, I checked MaxMind's site again (https://www.maxmind.com/en/geoip2-services-and-databases) and I think it confirms what I'm saying. We seem to have paid for the Enterprise and some of these data points we're interested in are only available from Insights…

Interestingly, while MaxMind claim that "user type" is only available via the GeoIP2 Precision Insights service, the example you posted has the trait set.

@Niharika @ARamirez_WMF Do we have a contact at MaxMind that we can reach out to clarify what data are available from which databases/services?

Recapping what we talked about: We specifically asked for the Enterprise and the Insights data services from MaxMind which they confirmed they were going to provide. @ARamirez_WMF will follow up with our contact at MaxMind about this discrepancy.
In the meantime it is probably wise to leave this in the backlog.

  • yes, “user type” is part of the Enterprise Product Suite, specifically that field is in the Enterprise Database itself (as opposed to Anonymous IP). We have both.
  • He assumes that they might not have user type info for every IP, but is unsure how common that is. Support will get back to me.
  • Regarding “insights” database - They have a “Precision Insights” service, which is an API that returns all of the data in the Enterprise Product Suite (including user type), but there is no “insights” field. And the Precision Insights Service, is only available in an API, but it’s the same data that’s in the Enterprise Product Suite.

coverage is expected to be about 99% for user_type; we may not return user_type if the ISP is new or recently renamed. 

  • It’s correct that the GeoIP2 Precision Insights data set is only available as a web service, not a downloadable .mmdb file. The GeoIP2 Precision Insights service uses web service credit, and it is not currently available in your account.

Neither the GeoIP2 Enterprise database nor the GeoIP2 Anonymous IP database include the following GeoIP2 Precision Insights outputs:

  • average_income: The average annual income associated with the IP address in US dollars (US only).
  • population_density: The estimated number of people per square kilometer (US only).
  • user_count: The estimated number of users sharing the IP/network in the past 24 hours.
  • static_ip_score: An indicator of how static or dynamic an IP address is (0 to 99.99).

By contrast, the GeoIP2 Precision Insights service does not include the following GeoIP2 Enterprise database output:

  • is_legitimate_proxy: 1 if the network is a legitimate proxy, otherwise 0.

Moving to backlog due to pending questions: how important are the static ip score, user count infos? And is this something we can get via another provider? Might require legal due to API privacy concerns. And caching.