Goal
This task is to add the "Number of users on this IP" (userCount) field to the IP Information box.
• Niharika | |
Dec 9 2020, 3:39 PM |
F34663060: image.png | |
Sep 30 2021, 6:43 PM |
F33932274: image.png | |
Dec 9 2020, 3:39 PM |
This task is to add the "Number of users on this IP" (userCount) field to the IP Information box.
Subject | Repo | Branch | Lines +/- | |
---|---|---|---|---|
Add connection type to the IPinfo API call | mediawiki/extensions/IPInfo | master | +68 -2 |
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Open | None | T285977 IP Info | |||
Open | None | T287647 IP Information accordion on Special:Contribs | |||
Open | None | T269764 Add "Number of users on this IP" to the API for pulling into the IP Information box | |||
Resolved | STran | T288933 Fetch information about an IP from the MaxMind GeoIP2 Enterprise database |
Not sure if this is a question, a concern, or a point of interest but maxmind describes user count as users on that IP in the last 24 hours. Since we're not using their web service but instead their dbs and we update dbs once a week, that suggests our data can be roughly a week out of date. Do we need to add an info tooltip to denote that? Does this (hypothetically) change the usefulness of this data point?
Change 649493 had a related patch set uploaded (by STran; owner: STran):
[mediawiki/extensions/IPInfo@master] Add connection type to the IPinfo API call
This patch set is technically ready for review because it does what I want it to do but I have a list of caveats about what I wanted it to do:
In the event that we get an Insight.mmdb-esque file to use for the reader, the method to read the file probably changes to insights($ip) but the rest should still be valid.
Based on what we talked about today, I'm moving this ticket back to ready for development until we know more.
copy pasta-ing from gerrit for ease of access:
Pulling this patch off the line for now - we talked more about how we thought this data was going to be available to us and
Given the number of unknowns around this (especially since a web services implementation would be different than the reader implementation), there doesn't seem to be an advantage to pushing this out.
Change 649493 abandoned by STran:
[mediawiki/extensions/IPInfo@master] Add connection type to the IPinfo API call
Reason:
Following on from yesterday's AHT: Estimation & Planning meeting:
The (in the last 24 hours) is likely going to be incorrect. Like the other information in the mock, we get the number of users of an IP from a local copy of the MaxMind database and not their Precision Insights API. Currently, those local copies are updated weekly and when we migrate MediaWiki to Kubernetes then they could be updated as regularly as daily but as slowly as once every 10 days (per T288844#7296775).
As a best effort, we could determine when the database was last updated (by stating it?) and include it into the "This information could lack accuracy" disclaimer message that we show.
From the engineering meeting today:
It's not clear if we even have access to this at all. I checked against the mmdb files for a random IP and received this (values redacted) information back:
$ ./Downloads/mmdbinspect_0.1.1_darwin_amd64/mmdbinspect --db ./Downloads/GeoIP2-Enterprise_20210921/GeoIP2-Enterprise.mmdb <ip> [ { "Database": "./GeoIP2-Enterprise.mmdb", "Records": [ { "Network": "<ip>", "Record": { "city": {}, "continent": {}, "country": {}, "location": {}, "postal": {}, "registered_country": {}, "subdivisions": [], "traits": { "autonomous_system_number": #, "autonomous_system_organization": "", "connection_type": "Cable/DSL", "domain": "", "isp": "", "organization": "", "user_type": "residential" } } } ], "Lookup": "<ip>" } ]
$ ./GeoIP2-Anonymous-IP.mmdb <ip> [ { "Database": "./Downloads/GeoIP2-Anonymous-IP_20210921/GeoIP2-Anonymous-IP.mmdb", "Records": [ { "Network": "<ip>", "Record": { "is_anonymous": true, "is_public_proxy": true } } ], "Lookup": "<ip>" } ]
As a disclaimer, I only looked up on IP but from even this single data point, it shows that we're not guaranteed to have this information. We might need to decide if this number is valuable to actively get beyond what we already have or if we can drop this information. @Prtksxna @Niharika
Additionally, I checked MaxMind's site again (https://www.maxmind.com/en/geoip2-services-and-databases) and I think it confirms what I'm saying. We seem to have paid for the Enterprise and some of these data points we're interested in are only available from Insights:
Interestingly, while MaxMind claim that "user type" is only available via the GeoIP2 Precision Insights service, the example you posted has the trait set.
@Niharika @ARamirez_WMF Do we have a contact at MaxMind that we can reach out to clarify what data are available from which databases/services?
Recapping what we talked about: We specifically asked for the Enterprise and the Insights data services from MaxMind which they confirmed they were going to provide. @ARamirez_WMF will follow up with our contact at MaxMind about this discrepancy.
In the meantime it is probably wise to leave this in the backlog.
coverage is expected to be about 99% for user_type; we may not return user_type if the ISP is new or recently renamed.
Neither the GeoIP2 Enterprise database nor the GeoIP2 Anonymous IP database include the following GeoIP2 Precision Insights outputs:
By contrast, the GeoIP2 Precision Insights service does not include the following GeoIP2 Enterprise database output:
Moving to backlog due to pending questions: how important are the static ip score, user count infos? And is this something we can get via another provider? Might require legal due to API privacy concerns. And caching.