Page MenuHomePhabricator

Technical investigation into adding on-wiki info to IPInfo [8H]
Closed, ResolvedPublic

Description

This is an investigation into ways we could solve T268657: Epic: IP Info popup.

Things to consider:

  • When/how should we request the data
  • Whether to call core APIs or build a custom end-point into IPInfo
  • How the IpInfoWidget will need modifying (high-level)

If there's time, it would be good to know whether/how we can get each piece of information (either via an API call or indexed query):

  • Whether the IP is blocked
  • How many times the IP has been blocked
  • How many edits the IP has made
  • How many edits the IP has made in the last 24 hours
  • How many global edits the IP has made

We want to do this investigation while IPInfo is still in fresh our minds, ready for resuming work in Q4.
We have these three tasks that should be expanded or repurposed based on the technical plan:

Event Timeline

Niharika triaged this task as Medium priority.Dec 16 2020, 9:29 PM
Niharika updated the task description. (Show Details)
Basic questions
  • When/how should we request the data

There could be up to 500 rows on a history or log page, each of which could contain a different IP address. Given that we wouldn't expect a patroller to want to see data for every IP address (or even most of them), it would be more efficient to fetch the data on demand, once the button is clicked.

The existing IPInfo APIs are designed to return Info objects with standardised properties (such as ASN, ISP, etc) containing general information about an IP address, potentially from multiple data sources. These Info objects should not be expanded to return on-wiki information, since 3rd-party information sources will never return on-wiki information, and the wiki will never return general IP information.

Instead the on-wiki information should be requested from a separate endpoint.

  • Whether to call core APIs or build a custom end-point into IPInfo

Most of the information we want to show is either:

  • not available for IP addresses from existing APIs
  • available but using the API would be inefficient since we only need a count; also would be limited to a maximum number of items the API can return

To get the required data efficiently, we could create a new API endpoint in IPInfo that queries the database tables directly. We'd need to determine whether the new database queries can be done performantly.

  • How the IpInfoWidget will need modifying (high-level)

The IpInfoWidget handles displaying data or errors, given a promise that succeeds or fails. We could either update the widget to handle multiple requests, or make a separate widget for the on-wiki data. (These would be technical decisions and needn't necessarily affect the appearance, though it could affect the HTML, e.g. two lists instead of one.)

Before we decide, there are a few design/product questions to consider:

  • If one request succeeds but the other fails, what should we do? E.g. should we show the data for the successful one and an error for the unsuccessful one?
  • If one set of data is returned faster than the other, should we show it first or should we wait until both have returned?

Although T268657 has no MaxMind data in the popup widget, it seems sensible to plan the widget(s) as though this could change, so either set of data could be displayed in the popup or the infobox.

Getting the data

Here's an overview of the availability of each piece of data from API modules or database queries. Note that editcounts are available for user accounts, but not IP addresses.

This is only included for completeness; my recommendation is not to use existing APIs. For any new queries we introduce, we will need to determine whether they can be done performantly.

API availabilityDatabase table
Whether the IP is blockedAvailable for IP addressesipblocks
How many times the IP has been blockedCould technically use logevents API filtered by type block and the IP address' User page, but inefficient when we only need the count; would be subject to a maximum valuelogging
How many edits the IP has madeCould technically use usercontribs API, but inefficient when we only need the count; would be subject to a maximum valuerevision, archive (if we also want to count deleted revisions)
How many edits the IP has made in the last 24 hoursCould technically use usercontribs API, but inefficient when we only need the count; would be subject to a maximum valuerevision, archive (if we also want to count deleted revisions)
How many global edits the IP has madeNot available for IP addresses from the API; available for global user accounts. (The documentation incorrectly states that this is available for IP addresses: T270708)I'm not aware of any existing precedents in MediaWiki for doing this for IP addresses. XTools, which provides much of the above information including edit count information for IP addresses, has a global contributions page, but this takes a while to load and doesn't show a global edit count; instead it links to separate edit counter pages for each wiki

Thanks for the in-depth explanation, @Tchanders. A few comments beneath:

To get the required data efficiently, we could create a new API endpoint in IPInfo that queries the database tables directly. We'd need to determine whether the new database queries can be done performantly.

I think this means we should start the conversation for this with the Performance team/DBAs soon. Do you think there is value in making this API endpoint in MW core rather than IPInfo? I'm thinking there could well be probably be cases when this information will be needed by other apps. I think we have talked about this before but I don't remember what the conclusion was, sorry.

Before we decide, there are a few design/product questions to consider:

  • If one request succeeds but the other fails, what should we do? E.g. should we show the data for the successful one and an error for the unsuccessful one?
  • If one set of data is returned faster than the other, should we show it first or should we wait until both have returned?

Pinging @Prtksxna as he is the authority on this. My personal preference would be to show what we can get and error for what we can't. For the latter, it would be nice to wait until we have everything to show although this depends on how much the wait time might vary.
We should wait until Prateek decides. His decision supersedes mine.

Although T268657 has no MaxMind data in the popup widget, it seems sensible to plan the widget(s) as though this could change, so either set of data could be displayed in the popup or the infobox.

Yes, we should plan for things to possibly change once we have done the user testing. Good call.

I think this means we should start the conversation for this with the Performance team/DBAs soon. Do you think there is value in making this API endpoint in MW core rather than IPInfo? I'm thinking there could well be probably be cases when this information will be needed by other apps. I think we have talked about this before but I don't remember what the conclusion was, sorry.

I don't think there's anything technically that would prevent it from being in core. We could develop it in the IPInfo extension and it could be upstreamed later if necessary?

Before we decide, there are a few design/product questions to consider:

  • If one request succeeds but the other fails, what should we do? E.g. should we show the data for the successful one and an error for the unsuccessful one?

I think we should show all the data that is available. We'll have to see how we choose to display the data we couldn't get. There could be a few possibilities here:

  • The data isn't available with MaxMind or on Wiki so we could show something like Not available
  • There was an error. We could ask the user to refresh or try again later (if we know that would help)
  • Just hide whatever we couldn't find
  • If one set of data is returned faster than the other, should we show it first or should we wait until both have returned?

As Niharika mentioned, we might want to decide this based on how long the time gap could be. I understand that this can be tricky to figure out.

We could figure out different loading designs that could minimize a UI jump and give users info as soon as we have it (for eg the shimmering rectangle pattern that shows up when you're waiting for real data to show up). We might also want to do different things based on whether the accordion was open when the page was loaded or if it was open afterwards.

I think this means we should start the conversation for this with the Performance team/DBAs soon. Do you think there is value in making this API endpoint in MW core rather than IPInfo? I'm thinking there could well be probably be cases when this information will be needed by other apps. I think we have talked about this before but I don't remember what the conclusion was, sorry.

I don't think there's anything technically that would prevent it from being in core. We could develop it in the IPInfo extension and it could be upstreamed later if necessary?

Yeah, that sounds like a good idea to me.