Page MenuHomePhabricator

Add IP Info (ASN & Geolocation) to requests to MediaWiki
Open, MediumPublic

Description

Problem
As part of T251602, the geolocation and AS number need to be collected.

We (Anti-Harassment) originally thought we would use external web services to fetch the information about an IP address T248525. While this mechanism is straightforward to implement, it prevents the product from being able to query logged actions based on this data. It also prevents aggregation of any kind.

However, as @Reedy explained in T248525#6101785, Wikimedia Foundation is already paying for and using MaxMind's proprietary dataset.

Proposed Solution
Instead of looking up information on IP addresses on-demand, it would be preferable in many ways to look up that information when a logged action (edit, etc.) is made and save the data into the database. This would be similar to the way CheckUser records User Agents.

This could either be done by passing the data from another service (like whatever proxy / cache accepts incoming requests) by adding the data as a request header or could be done from within MediaWiki itself.

Ideally what would be saved is a value that can be localized. Which would either be the ASN & GeoNames ID and/or the converted Wikidata ID.

Data Retention
Data retention will be the same as IP addresses are now:

Not logged-inIndefinently
Logged-in90 days in CheckUser

Questions

  1. What is Traffic & SRE preferred way to do this?
  2. Where should we store/download/cache MaxMind's dataset?

Event Timeline

colewhite triaged this task as Medium priority.May 5 2020, 7:58 PM

I wanted to clarify that this is just in the experiment and investigation stage.

We want to start a discussion about using MaxMind to get IP information and possibly adding some features to how the data is used.

We do not expect anyone to actually build something or write any code just yet.

BBlack subscribed.

The swap of Traffic for Traffic-Icebox in this ticket's set of tags was based on a bulk action for all tickets that aren't are neither part of our current planned work nor clearly a recent, higher-priority emergent issue. This is simply one step in a larger task cleanup effort. Further triage of these tickets (and especially, organizing future potential project ideas from them into a new medium) will occur afterwards! For more detail, have a look at the extended explanation on the main page of Traffic-Icebox . Thank you!