#### Description
The #anti-harassment Tools Team will be building a new extension for the #ip_info feature. The [[ https://meta.wikimedia.org/wiki/IP_Editing:_Privacy_Enhancement_and_Abuse_Mitigation/IP_Info_feature | project page ]] gives a detailed description of the project.//Basic description//
Effectively ourA new extension willfor provideing information //about// an IP address without (1) the need for the user to use an external service themselves and (2) exposing the IP address itself to the user. This provides the user with any details they could have retrieved from knowing the IP address. This information could be displayed in various ways (hover card, special page, etc.). Initially, this information would only be displayed on a select few pages (or as a beta feature to a select group of users). Eventually, this could be displayed to all (logged in?) users wherever an(Actually hiding the IP addressses is currently displayed in the interface.beyond the scope of this project.)
We plan on building an API endpoint T260603 that takes an edit id or log id and returns data about the IP addressed used for that action. For anonymous actions it would provide a result to all users (who are logged in?)This provides the user with information they could have retrieved from knowing the IP address. This endpoint //may// provide a result to checkusers for actions performed by logged in users.information could be displayed in various ways (popup on hover/click, Regardlessspecial page, the data will only be returned for actions performed within the previous year (?) for anonymous actions and 90 days for actions performed by logged-in usersetc.).
Based on our investigation in T259726, the data our users are looking for is not accessible from freely licensed datasets. Therefore, we will be looking to purchase a license to a proprietary dataset (or using one we've already purchased).//Data//
There are several ways we could implement this feature.Based on our investigation in T259726, the data our users are looking for is not accessible from freely licensed datasets alone. Ideally IPInfo would combine several datasets, We plan on creating an API endpoint T260603 that will accept a log id or revision id and return the information about the IP address used for that edit.but this depends on agreeing licences with different providers, This could even be added to the existing endpoints for revision or logsand will only be considered in future iterations.
What could be problematic is how this data is retrieved from the proprietary dataset.
There are at least three ways to accomplish this:
# Implement a background job process like #machinevision. The extension would fetch the information from the dataset after a revision or logged action has taken place and store this information in the database. There would also be a job to go through the historical IP addresses and backfill the database. This would have some #privacy concerns as the amount of personal-identifiable information (PII) in the database would increase rather than decrease (especially if we store latitude and longitude). T259725#6383339
# IP info is currently calculated as part of a [[ https://github.com/wikimedia/puppet/blob/3529ffc7b55d1e917f17a4175091860e3f81b790/modules/varnish/templates/geoip.inc.vcl.erb | custom varnish function ]] and attached to incoming requests with a Cookie (I assume this cookie is tied to the IP address being used?). This is currently being used by #wikimedia-fundraising to target banner display.The first iteration of IPInfo will use only MaxMind's GeoIP2 databases, We could expand the usage of this function and collect the incoming data (when an edit or logged action is preformed) within the database.which we already have licences for, This would still have the PII problems,and which are already available on our servers: T263263#6534392. but would prevent having to run a Job on the servers and would use an existing system.
# The information could be retrieved //on-demand// from the proprietary dataset.A PHP package providing an API for these databases is undergoing a security readiness review: T262963.
For third parties who do not have access to the proprietary datasets, This is a simple solution, reduces the PII we store in ourIPInfo will use MaxMind's free GeoLite2 database, but //could// have performance implications.s.
//API//
IPInfo provides two API endpoints, Proprietary datasets typically offer either a downloadable database (like [[ https://www.maxmind.com/ | MaxMind ]]) or a highly available/cachable webservice (like [[ https://ipinfo.io/ | IP Info ]]) or sometimes bothtaking an edit id or a log id. When a user requests information aboutIf the edit or log was performed by an anonymous user (or the log target is an IP addressanonymous user), the request will utilize our API endpoint, that endpoint will then lookup the data in the proprietary dataset on demand. To handle more requests,API returns data about the relevant IP address(es).
//Client-side UI//
Currently IPInfo adds a button next to IP addresses on Special:Log and history pages. we could move our API endpoint to not be in MediaWiki (using a PHP connection) and instead use a separateThe data are retrieved on clicking this button, custom microservice (with nginx?and displayed in a popup. node.js?) that could handle many more simultaneous requests.This design may change.
//Deployment//
Of the options available,We expect the feature to be deployed to all wikis, and be available on certain pages that show IP addresses. It will initially be available only to checkusers.
//Preventing abuse//
Sending an edit/log ID rather than the IP address will allow the IP address to remain hidden (once they become hidden in the future), and is intended to prevent IPInfo from being used as an API for getting information about arbitrary IP addresses.
Only users with the 'ipinfo' right can see the information. At first this right will only be given to checkusers. In the future when IP addresses are no longer visible, more users may need to access this information, e.g. for patrolling to fight vandalism. we believe that Option 3 carries the most risk performance-wiseThis will depend on user research and testing.
Users will sign an agreement when first using this tool, but the least risk from a 3rd-party license perspectiveand a record will be kept of which users have access. Since we are not 100% sure at this time what the restrictions of the license will beAccess may be taken away from a user, we will proceed with Option 3 until we know for sure that we are able to peruse a different optionand a record will be kept of this too: T264150.
#### Preview environment
> //(Insert one or more links to where the feature can be tested, e.g. on Beta Cluster.)//
>
> Hosting the changes on Beta Cluster is a requirement prior to performance review. Please ensure that the feature can be used directly on the link(s) provided, without any data entry such as having to create an article. Any sample content needed should already be present.
>
> If the changes cannot be hosted on Beta Cluster, explain why and provide links to an alternate public environment instead where the Performance Team can also SSH into. Links to code only is insufficient for a performance review.
The feature will either be available on the beta cluster or on our test environment (T260607) depending on the relative timing of this review and the security review (T260822)IPInfo is available to logged-in users on our test environment: https://thegoodplace.wmcloud.org/index.php?title=Special:Log
Clicking buttons next to the IP addresses will result in a popup displaying either data or an error message.
#### Which code to review
> //(Provide links to all proposed changes and/or repositories. It should also describe changes which have not yet been merged or deployed but are planned prior to deployment. E.g. production Puppet, wmf config, or in-flight features expected to complete prior to launch date, etc.).//
At the time of requesting this review, we're at the start of the project. The extension repository is at [[https://gerrit.wikimedia.org/r/admin/repos/mediawiki%2Fextensions%2FIPInfo|mediawiki/extensions/IPInfo]], but more detailed links will follow before the review takes place.Gerrit: https://gerrit.wikimedia.org/r/admin/repos/mediawiki%2Fextensions%2FIPInfo|mediawiki/extensions/IPInfo
Github: https://github.com/wikimedia/mediawiki-extensions-IPInfo
#### Performance assessment
> Please initiate the performance assessment by answering the below:
>
> - What work has been done to ensure the best possible performance of the feature?
We've evaluated alternative approaches (above).Data for each IP address is requested on demand, And implemented the best approach possible given the licensing restrictionsrather than holding up page load.
Data is only requested once for each log entry or edit on the page.
> - What are likely to be the weak areas (e.g. bottlenecks) of the code in terms of performance?
AWe may be limited by the performance of GeoIP2 library.
In future versions, if more datasets and services are used, a likely bottleneck will be the performance and availability of the 3rd party webservice.
> - Are there potential optimisations that haven't been performed yet?
Depending on the license and the product requirements, we may be able to cache revisions/logs made by anon users and make requests to our API without credentials (which will give all users the information from the varnish cache).
A new popup widget is appended for each button. The data for each log entry or edit is cached in the widget, but data for the same IP address may be requested more than once if it occurs in a different log entry or edit. We could re-use the same popup widget, and we could store a map of the data for each IP address rather than perform several requests for the same IP address (but different log/edit IDs). The design is subject to change, and we do not yet know how the tool will be used.
> - Please list which performance measurements are in place for the feature and/or what you've measured ad-hoc so far. If you are unsure what to measure, ask the Performance Team for advice: [[ mailto:performance-team@wikimedia.org | performance-team@wikimedia.org ]].
We plan on implementing logging of how long the request took tovia an EventLogging schema, including how long the third partyrequests take and how often these requests are being madee tool is used.