Page MenuHomePhabricator

Confirm and update what information IPInfo users should be able to see
Open, Needs TriagePublic

Description

I noticed this while updating IPInfo to show information from IPoid:

image.png (868×1 px, 133 KB)

A user with basic viewing rights can see almost nothing. Checking this against the codebase:

Basic/Full Access'countryNames', 'connectionType', 'userType', 'proxyType', 'numActiveBlocks', 'numLocalEdits', 'numRecentEdits'
Full Access Only'location', 'asn', 'isp', 'organization', 'behaviors', 'risks', 'connectionTypes', 'tunnelOperators', 'proxies', 'numUsersOnThisIP', 'numDeletedEdits'

Struck values represent deprecated values that used to be provided by MaxMind and bold values are ones added by IPoid. Since deprecating the MaxMind data in favor of the IPoid data, basic users can only see country-level location data and on-wiki data.

However, checking against tool guidelines, it's possible that some of these should have been allowed for users basic view rights:

image.png (1×1 px, 445 KB)

DataStatusPossible updates
CountryAlready available/No change
Connection methodDeprecatedCan be replaced by connnectionType
Connection ownerDeprecatedNo 1:1 replacement
Real IP / ProxyDeprecatedCan be replaced by proxies and/or tunnelOperators
Static / DynamicNever implemented/Deprecated
Number of devices on IPNew from IPoid (numUsersOnThisIP)

This would leave behaviors and risks as full-access only values.

We should:

  1. Determine if this is what we want
  2. Update the codebase and legal notice to have parity

(As an aside, the legal information is missing the distinction that full-access viewers can see deleted revisions whereas basic-access viewers cannot.)

Event Timeline

This is being noticed by users - e.g. was reported as T365139: Most fields display "No access" instead of data.

@Madalina Could we raise the priority of this?

We're only showing Spur data to users who have full access, but we're showing MaxMind data to users who have basic access. (See task description for which data is provided by which provider.)

@JayCano was this decision based on what the contract with Spur allows? Would it allow us to either show all of the information to users who have basic access, or else a subset of the information (e.g. Location and Organization)?

Another related issue is that we may have reduced coverage by using Spur data instead of MaxMind data.

T363118#9796877 and the following comments mention a problem with low coverage.

I think the IPs in the Spur dataset are inherently considered untrustworthy (with varying degrees of certainty). I believe this isn't the case for MaxMind.

If I'm right, then the IPInfo tool is not only giving information about untrustworthy IPs. Do we want this to be the case?

I would like to mention https://www.mediawiki.org/wiki/Talk:Trust_and_Safety_Product/IP_Info#Advanced_information which seems related: Currently the IPInfo tool is providing less information to non-admins (wich only get basic access) than anyone fulfilling the criteria of https://foundation.wikimedia.org/wiki/Policy:Access_to_temporary_account_IP_addresses#Patrollers_and_other_users gets using third-party tools after accessing the IP of temporary accounts.

From the task description:

Basic/Full Access'countryNames', 'connectionType', 'userType', 'proxyType', 'numActiveBlocks', 'numLocalEdits', 'numRecentEdits'
Full Access Only'location', 'asn', 'isp', 'organization', 'behaviors', 'risks', 'connectionTypes', 'tunnelOperators', 'proxies', 'numUsersOnThisIP', 'numDeletedEdits'

Basic access

The only MaxMind data that is currently showing for users without full access is the country name. We deliberately removed connectionType, userType and proxyType following user feedback that Spur is better. This information was coming from MaxMind's Enterprise database, and the budget for that was diverted to using Spur instead. That means that users with basic access can now only see the country.

If this needs to change, then we need a conversation between volunteers, product managers and budget owners.

Full access

For users with full access, ASN, ISP and organization are now obtained from Spur instead of MaxMind. This means coverage may have decreased, since MaxMind had wider coverage. We may be able to do something about this.

These fields should be available via the ISP database which we had access to at the time of writing T263263#6693361. (I'm currently locked out of production so I can't confirm by looking that we still have it.)

If we do have access to this, and we are allowed to show the data here, then we can update the IPInfo extension to use the ISP database instead. @JayCano Would it be possible to check this?

Could we reverse this task, perhaps? T288933: Fetch information about an IP from the MaxMind GeoIP2 Enterprise database

At the time of writing (16th August, 2021), IP Info reads from the separate City, ASN, ISP, and Connection-Type databases. The extension needs to be updated to read from the Enterprise database when it is available.

@Tchanders I'm currently investigating which licenses we currently have and what we are allowed to do with them. I'll keep you posted.

I had a quick chat with @kostajh regarding this and it seems to be a combination of two problems:

  • We removed the MaxMind data but gated the (possible) replacements behind full access
  • Spur data isn't necessarily suspicious, afaik, but it is dynamic and IPs do drop off. Kosta and I were looking into some IPs and one of the IPs used was in our historical data (via OpenSearch) but had already been considered stale and dropped in the database used by the API.

Just going through some of these comments:

We're only showing Spur data to users who have full access, but we're showing MaxMind data to users who have basic access

Just to be specific, this is a quirk of the attributes we've chosen to show. There's nothing in the code that explicitly splits source choosing across access levels like that.

If I'm right, then the IPInfo tool is not only giving information about untrustworthy IPs.

To add to this, for the attributes sourced from Spur, we prioritized updated information over historical data so it's also IPs that Spur flagged recently. Don't quote me on this one but I think it's maybe a 3-10% turnover (deleted/new inserts)? which is a large number of IPs when datasets can contain millions of IPs.

For users with full access, ASN, ISP and organization are now obtained from Spur instead of MaxMind. This means coverage may have decreased, since MaxMind had wider coverage. We may be able to do something about this.

We should consider fallbacks. See T355393: Provide fallbacks when source is missing data (but maybe in reverse, as we no longer have access to enterprise data). If Spur doesn't have it, check and return MaxMind (GeoLite2 for us now that we no longer use enterprise data).

If we do have access to this, and we are allowed to show the data here, then we can update the IPInfo extension to use the ISP database instead.

If we don't, we can consider using GeoLite2, which seems to give us the location data (via City.mmdb) and ISP data (via ASN.mmdb) we want.

Given the above, I would suggest we go ahead with implementing the fallback using GeoLite2. It provides country, city and asn databases that should cover most of our use cases when an IP is not known for being problematic. If an IP is a proxy, VPN or has been suspect of bad behaviour, it would show up on Spur.

I think T355393: Provide fallbacks when source is missing data should cover the problem that Spur doesn't have a comprehensive dataset and the implementation of the fallback mechanism with GeoLite2, as I think this ticket encompasses a different set of problems:

  • The original problem of this ticket is still around, which is that we should re-evaluate what data we show to users with basic access, since they can see almost nothing
  • With the deprecation of the Enterprise dataset, no source we use (GeoLite2, Spur) provides ISP data so we may want to consider deprecating that property completely
  • There's an actual bug where we don't show the number of users on an IP for basic users

I think T355393: Provide fallbacks when source is missing data should cover the problem that Spur doesn't have a comprehensive dataset and the implementation of the fallback mechanism with GeoLite2, as I think this ticket encompasses a different set of problems:

  • The original problem of this ticket is still around, which is that we should re-evaluate what data we show to users with basic access, since they can see almost nothing

My understanding is that we should make GeoLite2 data always available to users with the ipinfo-view-basic right. Then, if IPoid has data for a particular field, we would need to check if the user has ipinfo-view-full right in order to overwrite the contents of that particular field.

  • With the deprecation of the Enterprise dataset, no source we use (GeoLite2, Spur) provides ISP data so we may want to consider deprecating that property completely

Ack

  • There's an actual bug where we don't show the number of users on an IP for basic users

@STran can you please file a task for that? I don't see where ipinfo-view-basic should have access to the number of users on an IP.

Change #1036305 had a related patch set uploaded (by Kosta Harlan; author: Kosta Harlan):

[mediawiki/extensions/IPInfo@master] DefaultPresenter: Copy asn, location, organization fields into "view basic"

https://gerrit.wikimedia.org/r/1036305

Given the above, I would suggest we go ahead with implementing the fallback using GeoLite2. It provides country, city and asn databases that should cover most of our use cases when an IP is not known for being problematic. If an IP is a proxy, VPN or has been suspect of bad behaviour, it would show up on Spur.

In this patch I have proposed that we show ASN, location and organization fields to users with the "view basic" right. These are originally sourced from GeoLite2. This patch checks if the user has "view full" right before attempting to add IPoid data. However, AIUI, even if we didn't check the right before adding IPoid data, the code is already set up to not overwrite existing fields because we use += for merging values into the data array, and GeoLite2 data is added first, so once asn, organization and location are populated by GeoLite2, they would not be overwritten by the IPoid retriever.

Given the above, I would suggest we go ahead with implementing the fallback using GeoLite2. It provides country, city and asn databases that should cover most of our use cases when an IP is not known for being problematic. If an IP is a proxy, VPN or has been suspect of bad behaviour, it would show up on Spur.

In this patch I have proposed that we show ASN, location and organization fields to users with the "view basic" right. These are originally sourced from GeoLite2. This patch checks if the user has "view full" right before attempting to add IPoid data. However, AIUI, even if we didn't check the right before adding IPoid data, the code is already set up to not overwrite existing fields because we use += for merging values into the data array, and GeoLite2 data is added first, so once asn, organization and location are populated by GeoLite2, they would not be overwritten by the IPoid retriever.

cc @MMoss_WMF about https://foundation.wikimedia.org/wiki/Legal:IP_Information_tool_guidelines. tl;dr we are now sourcing ASN, organization and location information from GeoLite2. Since this is a free database that anyone can download and use, I assume there is no need to gate access to that data behind the "view full" right anymore. (The guidelines currently state only administrator, bureaucrat, checkuser, oversight, or stewards may access those fields currently.)