Page MenuHomePhabricator

CU 2.0: Fetch information to be displayed in Compare tab
Closed, ResolvedPublic8 Estimated Story Points

Description

Goal

This task is to fetch the data on the backend to be displayed in the initial load of the Compare tab in CU 2.0.

Data to be fetched
  1. For an input username:
    • IP addresses they have used
      • FYI note that this is limited to the past 90 days by our data policy limits so we won't find anything from before that
    • Count of number of edits made from the given IP + UA by the user
    • Count of number of other users using that IP
    • Count of number of edits made by other users from that IP
      • We use the above to display information like - 44 edits from 15 other users
    • User agent behind the edit
      • Different UA creates a different record as it indicates a different device.
    • Activity time period from that IP + UA (timestamp of first and last edit from that IP and that user agent)
  2. For an input IP address:
    • Usernames editing from that IP address
      • Like above, we won't get any data beyond 90 days
    • User agent behind the edit
      • Different UA creates a different record as it indicates a different device.
    • Activity time period from that IP + UA (timestamp of first and last edit from that IP and that user agent)
    • Count of number of edits made by the given user from that IP + UA
    • For unregistered users, we'll have a new record for every unique UA
      • Count of number of edits for each unregistered user from the unique UA
      • Activity time period from that unregistered user from that IP+UA
Example
UsernameActivityIPUser-agent
ApplesAugust 12, 11:00 - September 13, 10:001.2.3.4 - 17 edits (10 from 3 other users)Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.1 Safari/605.1.15
ApplesAugust 16, 13:00 - September 1, 8:001.5.6.4 - 3 edits (1560 from 35 other users)Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.1 Safari/605.1.15
BananasAugust 12, 11:00 - September 13, 10:001.2.3.4 - 18 edits (10 from 3 other users)Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.1 Safari/605.1.15
BananasAugust 16, 13:00 - September 1, 8:001.5.6.4 - 45 edits (1560 from 35 other users)Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.1 Safari/605.1.15
GrapesAugust 12, 11:00 - September 13, 10:00 1.5.6.7 - 70 editsMozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.1 Safari/605.1.15
UnregisteredAugust 16, 13:00 - September 1, 8:00 1.5.6.7 - 123 editsMozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.1 Safari/605.1.15
PineapplesAugust 12, 11:00 - September 13, 10:00 1.9.8.4 - 25 editsMozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.1 Safari/605.1.15
UnregisteredAugust 16, 13:00 - September 1, 8:00 1.9.8.4 - 76 editsMozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.1 Safari/605.1.15

Event Timeline

Niharika triaged this task as Medium priority.Nov 20 2019, 6:35 PM
Niharika removed a project: Epic.
Niharika updated the task description. (Show Details)
Niharika set the point value for this task to 8.
Niharika moved this task from Untriaged to Cards ready for development on the Anti-Harassment board.

@Prtksxna @Niharika
Using the first row on the example table, when we say 1.2.3.4 - 17 edits (10 from 3 other users), the way I see it it breaks down as follows:

  • Banana made 17 edits
  • Other users made 10 edits
  • The total of edits from that IP is 27

Where do unregistered edits on that IP fall? is that part of the 10 edits from 3 other users or do we want to exclude those?

Where do unregistered edits on that IP fall? is that part of the 10 edits from 3 other users or do we want to exclude those?

I assumed the IP would be one of the users, though I suppose it depends on how you interpret the word "user."

Where do unregistered edits on that IP fall? is that part of the 10 edits from 3 other users or do we want to exclude those?

I assumed the IP would be one of the users, though I suppose it depends on how you interpret the word "user."

Edits from the same IP are likely logged-out edits from the same user, relevant to the SPI, thus imo it makes sense to include those edits.
If IP Editing Privacy Enhancement moves forward, these edits will belong to an auto-generated user handle, anyway.

In T238714#5705848, @AronManning wrote:

Edits from the same IP are likely logged-out edits from the same user, relevant to the SPI, thus imo it makes sense to include those edits.
If IP Editing Privacy Enhancement moves forward, these edits will belong to an auto-generated user handle, anyway.

Sounds good to me

Change 555619 had a related patch set uploaded (by Dmaza; owner: Dmaza):
[mediawiki/extensions/CheckUser@master] [WIP] Add Compare service to fetch compare data

https://gerrit.wikimedia.org/r/555619

@Prtksxna @Niharika In the example table, should cells in the "IP" column be identical for the same IP? E.g. in the example, if the IP is 1.2.3.4, the cell always contains "1.2.3.4 - 17 edits (10 from 3 other users)". Or should they differ depending on the UA? I think @dmaza and I may have interpreted this differently.

If they are meant to be identical, then it looks like there's a typo in the lowest 4 cells of that column:

1.5.6.7 - 17 edits
1.5.6.7 - 123 edits
1.9.8.4 - 17 edits
1.9.8.4 - 123 edits

@Prtksxna @Niharika In the example table, should cells in the "IP" column be identical for the same IP? E.g. in the example, if the IP is 1.2.3.4, the cell always contains "1.2.3.4 - 17 edits (10 from 3 other users)". Or should they differ depending on the UA? I think @dmaza and I may have interpreted this differently.

If they are meant to be identical, then it looks like there's a typo in the lowest 4 cells of that column:

1.5.6.7 - 17 edits
1.5.6.7 - 123 edits
1.9.8.4 - 17 edits
1.9.8.4 - 123 edits

Whoops, they should differ depending on the UA. We would want to distinguish different users from different devices.
I was just lazy and didn't bother switching around the numbers. Let me update it to make it clearer.

@Niharika Thanks. If I'm now understanding correctly, should the acceptance criteria be updated too?

  • Count of number of other users using that IP
  • Count of number of other users using that IP + UA
  • Count of number of edits made by other users from that IP
  • Count of number of edits made by other users from that IP + UA

@Niharika Thanks. If I'm now understanding correctly, should the acceptance criteria be updated too?

  • Count of number of other users using that IP
  • Count of number of other users using that IP + UA
  • Count of number of edits made by other users from that IP
  • Count of number of edits made by other users from that IP + UA

So those counts are for the "10 from 3 other users" part that shows up next to the IP address. In that case, I think we should just show how many other edits were made from the same IP by other users, disregarding the UA. Because when they click that, it adds all the other users from the same IP address to the table, regardless of the UA making that edit.
Does that make sense?

So those counts are for the "10 from 3 other users" part that shows up next to the IP address. In that case, I think we should just show how many other edits were made from the same IP by other users, disregarding the UA. Because when they click that, it adds all the other users from the same IP address to the table, regardless of the UA making that edit.
Does that make sense?

Ah yes that makes sense, and that's what I was meaning. I should've asked "should the bit in brackets be identical"...

So just to be clear, what goes inside the parenthesis in the IP column should be UA independent. Right?

So just to be clear, what goes inside the parenthesis in the IP column should be UA independent. Right?

Yep.

Change 555619 merged by jenkins-bot:
[mediawiki/extensions/CheckUser@master] Add Compare service to fetch compare data

https://gerrit.wikimedia.org/r/555619

  1. For an input username:
    • IP addresses they have used
      • FYI note that this is limited to the past 90 days by our data policy limits so we won't find anything from before that

@Niharika Currently, we only show IPs and UAs associated with new page creation and normal edits. We don't show IPs/UAs associated with things like new account creations (which we do in current CheckUser). This might miss some IPs/UAs.

  • Count of number of edits made from the given IP + UA by the user

See above. Currently, we only count new pages and edits.

  • Count of number of other users using that IP
  • Count of number of edits made by other users from that IP
    • We use the above to display information like - 44 edits from 15 other users

Currently, we only show number of other users on that IP. But the PreliminaryCheckService does return the edit information, if we want to show it.

  1. For an input username:
    • IP addresses they have used
      • FYI note that this is limited to the past 90 days by our data policy limits so we won't find anything from before that

@Niharika Currently, we only show IPs and UAs associated with new page creation and normal edits. We don't show IPs/UAs associated with things like new account creations (which we do in current CheckUser). This might miss some IPs/UAs.

  • Count of number of edits made from the given IP + UA by the user

See above. Currently, we only count new pages and edits.

@dom_walden In my understanding, in current CheckUser, new account creations and other log entries are shown when someone picks the 'Get users' option, right?
If so, we got that covered under the Timeline tab.

  • Count of number of other users using that IP
  • Count of number of edits made by other users from that IP
    • We use the above to display information like - 44 edits from 15 other users

Currently, we only show number of other users on that IP. But the PreliminaryCheckService does return the edit information, if we want to show it.

Since this task is only about returning the info, we can close this. As for showing it, we can tackle that in another task after the user testing is complete and we have a final, stable definition of the UI.

@dom_walden In my understanding, in current CheckUser, new account creations and other log entries are shown when someone picks the 'Get users' option, right?
If so, we got that covered under the Timeline tab.

@Niharika OK. But the user might have to do more work to find any extra IPs/UAs on the Timeline tab which were not included in the Compare tab.

Otherwise, I have tested the accuracy of the data being returned by:

  • Modifying the unit tests so I could check the accuracy of the data with an external tool
  • Comparing what is shown in the UI with the current CheckUser and my own SQL queries (for randomly generated sockpuppet data on my local VM)

I tested for usernames, IPs (v4, v6 and ranges) and both.

I believe currently what we show is consistent with the acceptance criteria, with the exception noted below...

Currently, we only show number of other users on that IP. But the PreliminaryCheckService does return the edit information, if we want to show it.

I made a mistake here. Actually, we show number of edits. What we don't show is number of other users.

@dom_walden In my understanding, in current CheckUser, new account creations and other log entries are shown when someone picks the 'Get users' option, right?
If so, we got that covered under the Timeline tab.

@Niharika OK. But the user might have to do more work to find any extra IPs/UAs on the Timeline tab which were not included in the Compare tab.

There will be a filter to show "log entries" only so hopefully it won't be too much extra effort. We should keep an eye on this though and reiterate as needed. Thanks for pointing it out.

Otherwise, I have tested the accuracy of the data being returned by:

  • Modifying the unit tests so I could check the accuracy of the data with an external tool
  • Comparing what is shown in the UI with the current CheckUser and my own SQL queries (for randomly generated sockpuppet data on my local VM)

I tested for usernames, IPs (v4, v6 and ranges) and both.

I believe currently what we show is consistent with the acceptance criteria, with the exception noted below...

Currently, we only show number of other users on that IP. But the PreliminaryCheckService does return the edit information, if we want to show it.

I made a mistake here. Actually, we show number of edits. What we don't show is number of other users.

@dom_walden If I understand you correctly, the "from x other users" part is not shown? That can be fixed in T244816: CU 2.0: Add button to add all other users on an IP address to an investigation because we changed the design to have that information on the button which adds those other editors to the table. Let me know if that's not clear and I can clarify more. :)