CU 2.0: Filters in the Compare tab
Closed, ResolvedPublic3 Estimated Story Points
Actions

Assigned To

Authored By

	• Niharika
	Nov 21 2019, 1:52 AM

Description

Goal

We're introducing some filters in the Compare tab to allow users to be able to narrow down the search results and find patterns in the data.

Mock

https://prtksxna.github.io/wmf-cu-prototype/compare.html

Acceptance criteria

To begin with, we'll have one filters in the UI -

Hide the following users or IPs
- When a username or IP address is being typed in this field, it suggests auto-complete suggestions from the users table.
- The username/IP picked appears as a pill in the widget.
- The capsule is blue, whether or not the username is invalid. We ran into technical problems knowing all the users across all the tabs so we aren't doing the validation check on the usernames. T247279
- ~~As soon as a valid username or IP address is entered or the user picks a suggestion from the auto-complete suggester, the result set re-computes to exclude the user/IP as a target.~~ T247356

Details

	Subject	Repo	Branch	Lines +/-
	Add filters to the Compare tab on Special:Investigate	mediawiki/extensions/CheckUser	master	+240 -77

Customize query in gerrit

Related Objects
Search...

Status	Subtype	Assigned	Task
Open		None	T139810 RFC: Overhaul the CheckUser extension
Open		• Niharika	T236225 [Epic] CheckUser 2.0 Improvements
Open		• Niharika	T237593 [Epic] CheckUser 2.0: Compare
Resolved	Mar 20 2020	dbarratt	T247279 Filters on Special:Investigate should validate users
Resolved	Mar 25 2020	Tchanders	T247281 The target filter on Special:Investigate should be more fuzzy [medium]
Declined		None	T247356 Auto submit filters form on Special:Investigate when multiselect input changes
Resolved		dbarratt	T238809 CU 2.0: Filters in the Compare tab

Event Timeline

• Niharika triaged this task as Medium priority.Nov 21 2019, 1:52 AM

• Niharika created this task.

• Niharika moved this task from Untriaged to Triage/To be Estimated on the Anti-Harassment board.

From the description:

@Prtksxna should we give an error indicator perhaps?

Yeah, we can make the invalid capsules' text red. Similar to the categories input in Upload Wizard (though that is a redlink):

Screenshot 2019-11-21 at 6.37.14 PM.png (216×1 px, 41 KB)

The other option would be to not let them enter an incorrect input, and don't convert it into a capsule:

Screenshot 2019-11-21 at 6.40.49 PM.png (158×1 px, 28 KB)

Is this a validation we can run? If so I'd prefer we do this.

• Niharika mentioned this in T238808: CU 2.0: Pagination in the compare tab.Dec 10 2019, 7:18 PM

• Niharika updated the task description. (Show Details)Dec 10 2019, 7:27 PM

• Niharika updated the task description. (Show Details)Dec 10 2019, 7:32 PM

Taking this out for now, per our estimation meeting:

Hide the following user-agents

When a user-agent is being typed in this field, it suggests auto-complete suggestions from the users and IPs in the table below.

If a user enters a user-agent that is not in the table below, nothing happens. @Prtksxna should we give an error indicator perhaps?

As soon as a valid user-agent is entered or the user picks a suggestion from the auto-complete suggester, the result set re-computes to exclude all rows concerning the users or IPs picked by the user.

The user-agent picked appears as a pill in the widget.

We can discuss this once we've figured out how we want to break down the UA and change the database accordingly.

@Prtksxna We should update this ticket after we split the form out.

@Niharika, based on the information on each page this how I am thinking of splitting the filters

Compare	Timeline
User / IP	User / IP
User Agent	User Agent
	Page
	Type of activity

Does that sound alright?

In T238809#5778140, @Prtksxna wrote:

@Niharika, based on the information on each page this how I am thinking of splitting the filters

Compare Timeline

User / IP User / IP

User Agent User Agent

Page

Type of activity

Does that sound alright?

Whoops, I missed this. Yes, this sounds about right.

• Niharika updated the task description. (Show Details)Jan 9 2020, 6:42 PM

• Niharika updated the task description. (Show Details)Jan 9 2020, 7:44 PM

• Niharika updated the task description. (Show Details)

• Niharika updated the task description. (Show Details)Jan 9 2020, 7:49 PM

• Niharika set the point value for this task to 3.Jan 9 2020, 7:52 PM

• Niharika moved this task from Triage/To be Estimated to Cards ready for development on the Anti-Harassment board.

• Niharika updated the task description. (Show Details)Feb 6 2020, 7:42 PM

• Niharika moved this task from Cards ready for development to The Letter Song on the Anti-Harassment board.Feb 7 2020, 12:13 AM

• Niharika edited projects, added Anti-Harassment (The Letter Song); removed Anti-Harassment.

dbarratt claimed this task.Feb 20 2020, 3:20 PM

dbarratt moved this task from Ready 🎬 (ONLY IF YOU HAVE NO MORE CODE TO REVIEW) to In Progress 💪 on the Anti-Harassment (The Letter Song) board.

Change 574841 had a related patch set uploaded (by Dbarratt; owner: Dbarratt):
[mediawiki/extensions/CheckUser@master] Add filters to the Compare tab on Special:Investigate

https://gerrit.wikimedia.org/r/574841

gerritbot added a project: Patch-For-Review.Feb 25 2020, 6:59 PM

dbarratt mentioned this in T246958: HTMLUsersMultiselectField fails validation when exists is true, but required is false.Mar 5 2020, 12:02 AM

dbarratt added a subtask: T246958: HTMLUsersMultiselectField fails validation when exists is true, but required is false.Mar 5 2020, 12:16 AM

dbarratt moved this task from In Progress 💪 to Code Review 🔍 on the Anti-Harassment (The Letter Song) board.Mar 5 2020, 8:06 PM

@Niharika @Prtksxna - Re-asking a question that @dmaza asked on gerrit, about the precise meaning of this:

As soon as a valid username or IP address is entered or the user picks a suggestion from the auto-complete suggester, the result set re-computes to exclude all rows concerning the users or IPs picked by the user.

The current implementation excludes the entered users/IPs from the target list, but not necessarily from the final results. This means:

If 1.2.3.4 is a target, and ExampleUser is a filtered-out target, but has used IP 1.2.3.4, there could still be table rows for ExampleUser/1.2.3.4 (where 1.2.3.4 has a dark table cell to indicate that it is a target), but no rows for ExampleUser with other IPs.
If 1.2.3.4 is a target, and ExampleUser is a filtered-out target, but has used IP 1.2.3.4, the table cells for 1.2.3.4 contain (~n from all users), where n includes the edits made by ExampleUser.

If this is acceptable, it's straightforward enough to implement.

Alternatively is the intention that (in the above example) there should be no table rows for ExampleUser/1.2.3.4, even though 1.2.3.4 is a target (i.e. the filter outweighs the target)? Should (~n from all users) exclude ExampleUser's edits?

If the latter, I think there could be some technical challenges. Here are some ways to do it, and the challenges involved:

Filter out unwanted users/IPs at the subquery stage, by adding a WHERE to each subquery to ensure that rows don't contain any of the unwanted users/IPs. This might affect performance (e.g. scanning more rows rather than relying on an index alone), and we'd need to check whether it is still acceptable.
Filter out unwanted users/IPs at the outer query stage. This should affect performance less, since it will be done on the limited derived table. However, if a large proportion of the results contain the unwanted users/IPs, the final result might be smaller than expected (and the rows we're actually interested in may have been cut off due to the subquery limit). That may well be an edge-case though.
Keep the query the same, but filter out unwanted rows in PHP. I wouldn't favour this approach, since it could give us inconsistent table lengths across the pages. (E.g. almost all the rows may be filtered out from one page, but hardly any on the next, etc.)

In T238809#5948710, @Tchanders wrote:

@Niharika @Prtksxna - Re-asking a question that @dmaza asked on gerrit, about the precise meaning of this:

As soon as a valid username or IP address is entered or the user picks a suggestion from the auto-complete suggester, the result set re-computes to exclude all rows concerning the users or IPs picked by the user.

The current implementation excludes the entered users/IPs from the target list, but not necessarily from the final results. This means:

If 1.2.3.4 is a target, and ExampleUser is a filtered-out target, but has used IP 1.2.3.4, there could still be table rows for ExampleUser/1.2.3.4 (where 1.2.3.4 has a dark table cell to indicate that it is a target), but no rows for ExampleUser with other IPs.

If 1.2.3.4 is a target, and ExampleUser is a filtered-out target, but has used IP 1.2.3.4, the table cells for 1.2.3.4 contain (~n from all users), where n includes the edits made by ExampleUser.

If this is acceptable, it's straightforward enough to implement.

Yep, this was the intention, sorry for not being clearer in the description.

• Niharika updated the task description. (Show Details)Mar 6 2020, 5:51 PM

In T238809#5949163, @Niharika wrote:

Yep, this was the intention, sorry for not being clearer in the description.

Does that mean that this should be a select list of the existing targets provided (instead of a user multiselect widget that allows any user/ip to be specified)?

@dbarratt It should be a multiselect in the sense that more than one target can be selected at once.

If we want to restrict to the currently-existing targets, we may be able to pass in allowedValues as a config to the UsersMultiselectWidget.

In T238809#5948710, @Tchanders wrote:

@Niharika @Prtksxna - Re-asking a question that @dmaza asked on gerrit, about the precise meaning of this:

As soon as a valid username or IP address is entered or the user picks a suggestion from the auto-complete suggester, the result set re-computes to exclude all rows concerning the users or IPs picked by the user.

The current implementation excludes the entered users/IPs from the target list, but not necessarily from the final results. This means:

If 1.2.3.4 is a target, and ExampleUser is a filtered-out target, but has used IP 1.2.3.4, there could still be table rows for ExampleUser/1.2.3.4 (where 1.2.3.4 has a dark table cell to indicate that it is a target), but no rows for ExampleUser with other IPs.

If 1.2.3.4 is a target, and ExampleUser is a filtered-out target, but has used IP 1.2.3.4, the table cells for 1.2.3.4 contain (~n from all users), where n includes the edits made by ExampleUser.

If this is acceptable, it's straightforward enough to implement.

Alternatively is the intention that (in the above example) there should be no table rows for ExampleUser/1.2.3.4, even though 1.2.3.4 is a target (i.e. the filter outweighs the target)? Should (~n from all users) exclude ExampleUser's edits?

If the latter, I think there could be some technical challenges. Here are some ways to do it, and the challenges involved:

Filter out unwanted users/IPs at the subquery stage, by adding a WHERE to each subquery to ensure that rows don't contain any of the unwanted users/IPs. This might affect performance (e.g. scanning more rows rather than relying on an index alone), and we'd need to check whether it is still acceptable.

Filter out unwanted users/IPs at the outer query stage. This should affect performance less, since it will be done on the limited derived table. However, if a large proportion of the results contain the unwanted users/IPs, the final result might be smaller than expected (and the rows we're actually interested in may have been cut off due to the subquery limit). That may well be an edge-case though.

Keep the query the same, but filter out unwanted rows in PHP. I wouldn't favour this approach, since it could give us inconsistent table lengths across the pages. (E.g. almost all the rows may be filtered out from one page, but hardly any on the next, etc.)

I'm sorry, I rushed to respond to this. I misunderstood a range to have individual IPs as targets and not the range itself. There can well be a use case where the user wants to remove a specific IP/username from the results but that IP/username may not be a target itself. Like if I do a lookup for User A and User B:

User A	1.1.1.1
User A	1.1.1.2
User B	1.1.1.1
User B	1.1.1.3

I should be able to remove 1.1.1.1 from the list of results. In a fairly large data set it is possible the user tries to narrow down on suspicious IPs.

So from what @Tchanders laid out, I think we want to go with the latter. I am okay if we want to iterate, starting with the former idea (since targets are a subset of all users+IPs) and making a new ticket if we want to experiment with the latter.

dbarratt moved this task from Code Review 🔍 to In Progress 💪 on the Anti-Harassment (The Letter Song) board.Mar 9 2020, 2:59 PM

dbarratt added a parent task: T247279: Filters on Special:Investigate should validate users.Mar 10 2020, 2:42 AM

dbarratt removed a subtask: T246958: HTMLUsersMultiselectField fails validation when exists is true, but required is false.

dbarratt mentioned this in T247281: The target filter on Special:Investigate should be more fuzzy [medium].Mar 10 2020, 2:53 AM

dbarratt added a parent task: T247281: The target filter on Special:Investigate should be more fuzzy [medium].

In T238809#5949259, @Niharika wrote:

I am okay if we want to iterate, starting with the former idea (since targets are a subset of all users+IPs) and making a new ticket if we want to experiment with the latter.

Here's a follow up task to improve the way the filter works: T247281

dbarratt moved this task from In Progress 💪 to Code Review 🔍 on the Anti-Harassment (The Letter Song) board.Mar 10 2020, 3:04 AM

dbarratt mentioned this in T247356: Auto submit filters form on Special:Investigate when multiselect input changes.Mar 10 2020, 9:25 PM

dbarratt added a subtask: T247356: Auto submit filters form on Special:Investigate when multiselect input changes.

dbarratt removed a subtask: T247356: Auto submit filters form on Special:Investigate when multiselect input changes.

dbarratt added a parent task: T247356: Auto submit filters form on Special:Investigate when multiselect input changes.

dbarratt updated the task description. (Show Details)

Change 574841 merged by jenkins-bot:
[mediawiki/extensions/CheckUser@master] Add filters to the Compare tab on Special:Investigate

https://gerrit.wikimedia.org/r/574841

ReleaseTaggerBot added a project: MW-1.35-notes (1.35.0-wmf.24; 2020-03-17).Mar 11 2020, 9:00 PM

dbarratt moved this task from Code Review 🔍 to QA/Testing 🐞 on the Anti-Harassment (The Letter Song) board.Mar 11 2020, 9:07 PM

Maintenance_bot removed a project: Patch-For-Review.Mar 11 2020, 9:11 PM

Doing an investigation of $target_a and $target_b and filtering out $target_b is equivalent to doing an investigation on $target_a. I checked this by comparing the SQL queries performed (and to a lesser extent the output). This includes the full 100k limit getting applied to the remaining target.

The filters persist across pages and tabs.

I can submit the filter form with a non-existent target (where the input field goes red), but it just gets ignored.

dbarratt closed this task as Resolved.Mar 13 2020, 4:36 PM

	F31122847: Screenshot 2019-11-21 at 6.40.49 PM.png
	Nov 21 2019, 1:11 PM

	F31122839: Screenshot 2019-11-21 at 6.37.14 PM.png
	Nov 21 2019, 1:09 PM

	F31506365: image.png
	Jan 9 2020, 6:42 PM

	F31467767: image.png
	Dec 10 2019, 7:27 PM

CU 2.0: Filters in the Compare tabClosed, ResolvedPublic3 Estimated Story PointsActions