Page MenuHomePhabricator

Reorganise preliminary check service for paginated results
Open, MediumPublic3 Estimate Story Points

Description

The results will be displayed using a TablePage.

The first should query the CentralAuth database for the username, wiki and registration date. Pagination will be done on this query (see e.g. IndexPager::buildQueryInfo).

The second should query each local database for the user edit count, blocked status and group information, for each row in the page results.

Details

Related Gerrit Patches:

Event Timeline

Tchanders moved this task from Untriaged to Cards ready to be discussed on the Anti-Harassment board.

It seems like we should:

  1. extend TablePager
  2. implement getQueryInfo() (to specify the initial query)
  3. implement preprocessResults() (to attach the additional data to each row)
  4. implement getFieldNames() (to ensure the field names are what we want to display)

It looks like Action API Modules do not use Pager (from what I can tell, since it seems like a UI thing, but maybe @Anomie can confirm that). Therefore, the service should probably return the shared data (i.e. getQueryInfo() would call a method on a service to return the query info).

@dbarratt 1-5 are already done in https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/CheckUser/+/552269/

This task is only for updating the service.

The overridden methods will of course need updating again after that.

Niharika triaged this task as Medium priority.Nov 21 2019, 4:48 PM
Niharika set the point value for this task to 3.Nov 21 2019, 5:18 PM

It looks like Action API Modules do not use Pager (from what I can tell, since it seems like a UI thing, but maybe @Anomie can confirm that).

Yes, Pager is entirely focused on special page UI, it's completely unusable for any other purpose.

IndexPager and its subclasses are also fairly focused on simple queries for special page UI. It works well when the queries are simple, but if you have to do something complex it can be difficult to work around some of the assumptions it makes.

dbarratt claimed this task.Nov 22 2019, 2:11 PM
dbarratt moved this task from Ready to In Progress on the Anti-Harassment (The Letter Song) board.

Change 552889 had a related patch set uploaded (by Dbarratt; owner: Dbarratt):
[mediawiki/extensions/CheckUser@master] Reorganise preliminary check service for paginated results

https://gerrit.wikimedia.org/r/552889

dbarratt updated the task description. (Show Details)
dbarratt removed dbarratt as the assignee of this task.Dec 6 2019, 10:23 PM

The current patch does the following, given a list of users:

  1. Get the name, ID, edit count and registration date of each user from the local user table. Paginate on this. (Order by user name.)
  2. Get the wikis for each user on this page from Central Auth
  3. Get the rest of the local user information for each user for each wiki

The table is organized into rows where one table row holds data for one user, but each user's table row contains multiple result rows. The resulting length of the table is no. users x no. wikis.

This could be problematic, because the query we're paginating on does not really determine the number of result rows, meaning that table lengths can differ between pages, and it is difficult to control this with pagination limits.

In the examples we've looked at, users have had many rows (example 1, example 2). If we expect users under investigation to have this many accounts, it might be sensible to set the limit to 1 user per page; however, for users who have only a few accounts this would result in very short pages.

The approach from the original task description suggests the following, which avoids this problem:

  1. Get the username, wiki and registration date from CentralAuth (localuser table). Paginate on this. (Order by username and wiki.)
  2. Get the rest of the local user information for each user for each wiki

This way, each page will be a predictable length, and the number of result rows in each table will be smaller than or equal to the limit.

The potential downside is that the rows from one user could span more than one page. However, this might be preferable to alternating long and short pages.

dmaza added a subscriber: dmaza.Dec 12 2019, 8:50 PM

We really need to get more things in writing 'cause I recall having this conversation with someone at some point and completely forgot about it. Having had this information while we were deciding to use a pager or a fake wrapper would have significantly speed the decision and make it easy and obvious (at least to me)

In the examples we've looked at, users have had many rows (example 1, example 2). If we expect users under investigation to have this many accounts, it might be sensible to set the limit to 1 user per page; however, for users who have only a few accounts this would result in very short pages.

I don't think that 1 user per page is a good UX. Another (compromising) alternative and we talked about this was to put a hard limit on the number of wikis we'll display per user and/or to limit the wikis to where the user has at least 1 edit.

The potential downside is that the rows from one user could span more than one page.

To mitigate this or make it less frequent the same limits that I mentioned above.

So I guess the question is:
Do we want consistent page length with the potential downside of having one user span more than one page or can we live with varying rows per page?

Another alternative is to do infinite scrolling

We really need to get more things in writing 'cause I recall having this conversation with someone at some point and completely forgot about it.

Completely agree that it helps to write things down. My recollection was that we did have that conversation, in which we agreed to have consistent page lengths, with users potentially spanning more than one page.

That conclusion was documented in writing via the plan in the description of this task...

Thalia is correct here. Maybe I didn't capture the details correctly in the initial tasks I wrote. Still, we did have the conversation and this is the result. I'll work to ensure we document things better in the future. I might get annoying about it. :)

I believe that we should have consistent page sizes and said this in a couple of conversations. The UX is really bad for pagination if the page sizes change. If a result/user spans multiple pages, the investigating user can use filtering to narrow down the pages they are dealing with.

I would avoid infinite scrolling. It's an anti-pattern for this kind of feature.

dmaza added a comment.Dec 13 2019, 6:19 PM

My apologies for the misunderstanding then.

If consistent page sizes is/was part of the acceptance criteria then the "FakeWrapper" is the easiest way to go if I'm understanding how this works correctly. I guess we'll have to do a 180 back to the original plan.

If a result/user spans multiple pages, the investigating user can use filtering to narrow down the pages they are dealing with.

There are no filters on this page in the mocks but there is nothing preventing us from adding them if it becomes a problem

Tchanders moved this task from Review to In Progress on the Anti-Harassment (The Letter Song) board.
dbarratt added a comment.EditedThu, Jan 2, 4:05 PM

So the way I understood it (and please correct me if this is incorrect) is what is displayed in this image:

is 3 rows. The current implementation does not have this row grouping (and I didn't add it) which is probably where the confusion is coming from.

Are we saying that this image has 6 rows rather than 3?

Tchanders added a comment.EditedThu, Jan 2, 4:19 PM

@dbarratt Yes - it's more obvious if you imagine each user being associated with 50 wikis and ask the question, does the table have 3 rows or 150 rows? We did discuss this in a meeting before this task was written, which was the reasoning behind the queries being described the way they are in the task description.

@dbarrat Yes - it's more obvious if you imagine each user being associated with 50 wikis and ask the question, does the table have 3 rows or 150 rows? We did discuss this in a meeting before this task was written, which was the reasoning behind the queries being described the way they are in the task description.

The only way I can think to implement it that way is to load all of the wikis for all of the users specified, which is, I imagine, too costly.

Change 558147 had a related patch set uploaded (by Tchanders; owner: Tchanders):
[mediawiki/extensions/CheckUser@master] Reorganise preliminary check service for paginated results

https://gerrit.wikimedia.org/r/558147

Change 563193 had a related patch set uploaded (by Tchanders; owner: Tchanders):
[mediawiki/extensions/CheckUser@master] [WIP alternative] Reorganise preliminary check service for paginated results

https://gerrit.wikimedia.org/r/563193

Change 558147 merged by jenkins-bot:
[mediawiki/extensions/CheckUser@master] Reorganise preliminary check service for paginated results

https://gerrit.wikimedia.org/r/558147

@dom_walden It probably makes more sense to test T238827, T237298 and T239680 together.