Page MenuHomePhabricator

Allow searching for CU records from a specific date range
Closed, DeclinedPublicFeature

Description

Feature summary (what you would like to be able to do and where):
Right now, you can request CU records for one of these periods: last week, last 2 weeks, last 30 days, last 60 days, or "all".

When a CIDR is heavily used, a 60-day/all query will often surpass the 5000 row limit. You can of course limit your query to last 30 days, but then you will miss the sleeper accounts. But there is no way to complement your query with another query which looks at 31-60 days ago and another that looks 61-90 days ago.

I propose we change the dropdown for time range and add a "user-specific date range" choice. If selected, the user can use to date inputs to specify the first and last day of the period of interest.

Use case(s) (list the steps that you performed to discover that problem, and describe the actual underlying problem which you want to solve. Do not describe only a solution):

For situations where an 60-day/all query hits the maximum result limit, the user can break a large query into smaller queries an cover the entire time range of interest without running into the 5000 row limit.

Benefits (why should this be implemented?):

CUs can query the entire data for a wide CIDR and will not have to make compromises.

Event Timeline

I'm moving what I said in https://gerrit.wikimedia.org/r/c/mediawiki/extensions/CheckUser/+/803302?tab=comments to here:

I'm not the PM but I'd decline that ticket and add a solution to the problem of 5000 rows by either implementing a correct paging or adding more options (the first months of last three months for example). That ticket is a classic x/y problem. I agree on the problem but the solution doesn't have to come from arbitrary time range selection.
The reason I'm saying is the whole concept of "less is more" in software engineering, the more freedom you give to users, you end up in more issues and unknown unknowns. Doubly so in complex codebases like mediawiki that has twenty years of tech debt. Here is an explanation about avoiding too much freedom in configuration but the same applies here as well https://www.youtube.com/watch?v=NcT8-IoImXE

If T311375 is implemented this may make this less important. However, the way that would be implemented that I see would be basically some sort of offset. This offset could either be time based or ID based, but if this was time based then it would implement half of what this ticket is requesting (as it would be from arbitrary time X defined to end of period).

If T311375 is implemented this may make this less important. However, the way that would be implemented that I see would be basically some sort of offset. This offset could either be time based or ID based, but if this was time based then it would implement half of what this ticket is requesting (as it would be from arbitrary time X defined to end of period).

The offset could be based on absolute time (2022-05-01 to 2022-05-30) or it could be relative time (today - 60 to today - 30).

I'm moving what I said in https://gerrit.wikimedia.org/r/c/mediawiki/extensions/CheckUser/+/803302?tab=comments to here:

I'm not the PM but I'd decline that ticket and add a solution to the problem of 5000 rows by either implementing a correct paging or adding more options (the first months of last three months for example). That ticket is a classic x/y problem. I agree on the problem but the solution doesn't have to come from arbitrary time range selection.
The reason I'm saying is the whole concept of "less is more" in software engineering, the more freedom you give to users, you end up in more issues and unknown unknowns. Doubly so in complex codebases like mediawiki that has twenty years of tech debt. Here is an explanation about avoiding too much freedom in configuration but the same applies here as well https://www.youtube.com/watch?v=NcT8-IoImXE

I hear you. But I think that ship has sailed (especially for CheckUser) a long time ago. We are already dealing with a highly ineffective code base that is barely maintained. One more band aid once in a while--as bad as band aids are, philosophically--won't kill anyone.

The offset could be based on absolute time (2022-05-01 to 2022-05-30) or it could be relative time (today - 60 to today - 30).

If the offset is time based I would want to see it be based on up to a particular absolute time. This is because if a relative time is used and this is also used for the paging idea in T311375 the link is clicked at a later time (i.e. taking a while to look at the first results) then some results in the next page may be missed. Certainly the way that Special:Contributions does it, which is to have an offset of a timestamp, seems best to me. Ideally I would not want to have two ways that determine the period, so if both can use a timestamp (on the form use a date selector with the time of 00:00 and for the pager use a timestamp) then it means the code can probably be shared.

What I meant is: the offset you choose in the UI could be relative, and the software would translate it to absolute time for you.

Change 810529 had a related patch set uploaded (by Dreamy Jazz; author: Dreamy Jazz):

[mediawiki/extensions/CheckUser@master] Allow specifying the period by from and to dates in CheckUser

https://gerrit.wikimedia.org/r/810529

Dreamy_Jazz triaged this task as Medium priority.

Assigning as I've uploaded a patch. Triaging as medium as T311375 depends on this.

Change 810529 abandoned by Dreamy Jazz:

[mediawiki/extensions/CheckUser@master] Allow specifying the period by from and to dates in CheckUser

Reason:

Discussion on checkuser-l suggests that not a majority of CUs want this.

https://gerrit.wikimedia.org/r/810529

@Huji I'm thinking that the ability to select by specific periods won't be as needed now with the pagination being implemented. Also discussion on the checkuser list suggested that not many people would find the ability to search by a specific period useful. I've abandoned my patch because of this and unless you object I may close this as declined over being able to page the results. How to deal with the API and therefore how to store the period in the checkuser log table can be discussed in the parent task.

I'm going to close this task as declined. If anyone disagrees you have my permission to re-open.