Page MenuHomePhabricator

CU 2.0: Fetch information to be displayed in Compare tab
Open, MediumPublic8 Estimate Story Points

Description

Goal

This task is to fetch the data on the backend to be displayed in the initial load of the Compare tab in CU 2.0.

Data to be fetched
  1. For an input username:
    • IP addresses they have used
      • FYI note that this is limited to the past 90 days by our data policy limits so we won't find anything from before that
    • Count of number of edits made from the given IP + UA by the user
    • Count of number of other users using that IP
    • Count of number of edits made by other users from that IP
      • We use the above to display information like - 44 edits from 15 other users
    • User agent behind the edit
      • Different UA creates a different record as it indicates a different device.
    • Activity time period from that IP + UA (timestamp of first and last edit from that IP and that user agent)
  2. For an input IP address:
    • Usernames editing from that IP address
      • Like above, we won't get any data beyond 90 days
    • User agent behind the edit
      • Different UA creates a different record as it indicates a different device.
    • Activity time period from that IP + UA (timestamp of first and last edit from that IP and that user agent)
    • Count of number of edits made by the given user from that IP + UA
    • For unregistered users, we'll have a new record for every unique UA
      • Count of number of edits for each unregistered user from the unique UA
      • Activity time period from that unregistered user from that IP+UA
Example
UsernameActivityIPUser-agent
ApplesAugust 12, 11:00 - September 13, 10:001.2.3.4 - 17 edits (10 from 3 other users)Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.1 Safari/605.1.15
ApplesAugust 16, 13:00 - September 1, 8:001.5.6.4 - 3 edits (1560 from 35 other users)Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.1 Safari/605.1.15
BananasAugust 12, 11:00 - September 13, 10:001.2.3.4 - 18 edits (10 from 3 other users)Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.1 Safari/605.1.15
BananasAugust 16, 13:00 - September 1, 8:001.5.6.4 - 45 edits (1560 from 35 other users)Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.1 Safari/605.1.15
GrapesAugust 12, 11:00 - September 13, 10:00 1.5.6.7 - 70 editsMozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.1 Safari/605.1.15
UnregisteredAugust 16, 13:00 - September 1, 8:00 1.5.6.7 - 123 editsMozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.1 Safari/605.1.15
PineapplesAugust 12, 11:00 - September 13, 10:00 1.9.8.4 - 25 editsMozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.1 Safari/605.1.15
UnregisteredAugust 16, 13:00 - September 1, 8:00 1.9.8.4 - 76 editsMozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.1 Safari/605.1.15

Details

Related Gerrit Patches:
mediawiki/extensions/CheckUser : masterAdd Compare service to fetch compare data

Event Timeline

Niharika triaged this task as Medium priority.Nov 20 2019, 6:35 PM
Niharika removed a project: Epic.
Niharika updated the task description. (Show Details)
Niharika set the point value for this task to 8.
Niharika moved this task from Untriaged to Cards ready for development on the Anti-Harassment board.
Niharika updated the task description. (Show Details)Nov 20 2019, 6:49 PM
Niharika updated the task description. (Show Details)Nov 21 2019, 5:36 PM
dmaza claimed this task.Nov 22 2019, 4:01 PM
dmaza moved this task from Ready to In Progress on the Anti-Harassment (The Letter Song) board.
dmaza added a comment.Nov 27 2019, 9:04 PM

@Prtksxna @Niharika
Using the first row on the example table, when we say 1.2.3.4 - 17 edits (10 from 3 other users), the way I see it it breaks down as follows:

  • Banana made 17 edits
  • Other users made 10 edits
  • The total of edits from that IP is 27

Where do unregistered edits on that IP fall? is that part of the 10 edits from 3 other users or do we want to exclude those?

Where do unregistered edits on that IP fall? is that part of the 10 edits from 3 other users or do we want to exclude those?

I assumed the IP would be one of the users, though I suppose it depends on how you interpret the word "user."

AronManning added a comment.EditedDec 2 2019, 5:15 PM

Where do unregistered edits on that IP fall? is that part of the 10 edits from 3 other users or do we want to exclude those?

I assumed the IP would be one of the users, though I suppose it depends on how you interpret the word "user."

Edits from the same IP are likely logged-out edits from the same user, relevant to the SPI, thus imo it makes sense to include those edits.
If IP Editing Privacy Enhancement moves forward, these edits will belong to an auto-generated user handle, anyway.

dmaza added a comment.Dec 2 2019, 6:12 PM

Edits from the same IP are likely logged-out edits from the same user, relevant to the SPI, thus imo it makes sense to include those edits.
If IP Editing Privacy Enhancement moves forward, these edits will belong to an auto-generated user handle, anyway.

Sounds good to me

Change 555619 had a related patch set uploaded (by Dmaza; owner: Dmaza):
[mediawiki/extensions/CheckUser@master] [WIP] Add Compare service to fetch compare data

https://gerrit.wikimedia.org/r/555619

@Prtksxna @Niharika In the example table, should cells in the "IP" column be identical for the same IP? E.g. in the example, if the IP is 1.2.3.4, the cell always contains "1.2.3.4 - 17 edits (10 from 3 other users)". Or should they differ depending on the UA? I think @dmaza and I may have interpreted this differently.

If they are meant to be identical, then it looks like there's a typo in the lowest 4 cells of that column:

1.5.6.7 - 17 edits
1.5.6.7 - 123 edits
1.9.8.4 - 17 edits
1.9.8.4 - 123 edits

@Prtksxna @Niharika In the example table, should cells in the "IP" column be identical for the same IP? E.g. in the example, if the IP is 1.2.3.4, the cell always contains "1.2.3.4 - 17 edits (10 from 3 other users)". Or should they differ depending on the UA? I think @dmaza and I may have interpreted this differently.
If they are meant to be identical, then it looks like there's a typo in the lowest 4 cells of that column:
1.5.6.7 - 17 edits
1.5.6.7 - 123 edits
1.9.8.4 - 17 edits
1.9.8.4 - 123 edits

Whoops, they should differ depending on the UA. We would want to distinguish different users from different devices.
I was just lazy and didn't bother switching around the numbers. Let me update it to make it clearer.

Niharika updated the task description. (Show Details)Dec 10 2019, 6:39 PM

@Niharika Thanks. If I'm now understanding correctly, should the acceptance criteria be updated too?

  • Count of number of other users using that IP
  • Count of number of other users using that IP + UA
  • Count of number of edits made by other users from that IP
  • Count of number of edits made by other users from that IP + UA

@Niharika Thanks. If I'm now understanding correctly, should the acceptance criteria be updated too?

  • Count of number of other users using that IP
  • Count of number of other users using that IP + UA
  • Count of number of edits made by other users from that IP
  • Count of number of edits made by other users from that IP + UA

So those counts are for the "10 from 3 other users" part that shows up next to the IP address. In that case, I think we should just show how many other edits were made from the same IP by other users, disregarding the UA. Because when they click that, it adds all the other users from the same IP address to the table, regardless of the UA making that edit.
Does that make sense?

Tchanders added a comment.EditedDec 10 2019, 6:52 PM

So those counts are for the "10 from 3 other users" part that shows up next to the IP address. In that case, I think we should just show how many other edits were made from the same IP by other users, disregarding the UA. Because when they click that, it adds all the other users from the same IP address to the table, regardless of the UA making that edit.
Does that make sense?

Ah yes that makes sense, and that's what I was meaning. I should've asked "should the bit in brackets be identical"...

dmaza added a comment.Dec 12 2019, 8:53 PM

So just to be clear, what goes inside the parenthesis in the IP column should be UA independent. Right?

So just to be clear, what goes inside the parenthesis in the IP column should be UA independent. Right?

Yep.

Change 555619 merged by jenkins-bot:
[mediawiki/extensions/CheckUser@master] Add Compare service to fetch compare data

https://gerrit.wikimedia.org/r/555619