Page MenuHomePhabricator

Investigate: How could we update Special:Contributions and Special:Log to take multiple user targets?
Open, Needs TriagePublic

Description

Background

As part of making it easier to see connected temporary accounts who are sharing the same IP address, we would like to show contributions and logs for multiple temporary accounts.

If this can be done easily, it may be worth updating the Contributions and Log pages to take multiple targets of any kind, which may be useful for patrollers investigating sockpuppets, edit wars, etc. (There is precedent for this in Special:Investigate's timeline, but that is only usable by checkusers.)

Acceptance criteria

A plan for how to build contributions/log pages that can show entries multiple connected temporary accounts.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Investigation

I looked into whether it would be feasible to simply update Special:Contributions and Special:Log to take multiple targets. A few things complicate this, so I would suggest starting with something simpler.

In summary, I'd suggest we start by introducing a mode for when the target is a temporary account, to show and hide the related temporary accounts' logs or contributions.

Details are below.

Special:Contributions

Many parts of Special:Contributions assume there is only one target:

  • The tool links at the top, e.g. "block", etc, work for a single target.
  • A warning is shown if the user doesn't exist. What if some users exist and others don't?
    • We could configure a usersmultiselect form input widget only to accept existing users, so all users must exist.
    • However, we need to keep supporting the use-case of an external user with contributions, at least as a single target. E.g. if contributions have been imported from another wiki and assigned to an external user that doesn't exist locally, their contributions can currently be viewed via Special:Contributions even if they don't exist.
  • A "Contribute" button if the target is the performer, but what if one of the targets is the performer?
  • IP ranges have a maximum allowed size to avoid large queries. Should we therefore exclude IP ranges as multiple targets?
  • Should we support syndication feeds if there are multiple targets?
  • Handlers of hooks like onSpecialContributionsBeforeMainOutput expect one target - should we update them, or not call them if there are multiple targets?
  • Should we allow a mixture of IPs and accounts? Is that making it too easy to relate IPs and accounts?
    • Would an IP/IP range target be handled like legacy IP contributions, or like IPContributions?
    • If we did allow IP targets, showing legacy IP contributions, this could solve the issue raised by enwiki functionaries that it would be helpful to see temporary account contributions and legacy IP contributions in one place

Query performance:

  • This query condition would be updated to have multiple actors.
  • The query is indexed by (actor, timestamp), which means sorting by timestamp is fast when there is only one actor. But if there are several actors and any of them have a lot of revisions (e.g. thousands), then sorting by timestamp could be slow, since all these would need to be fetched per actor and sorted. But this is presumably already the case on Special:IPContributions and Special:GlobalContributions, and perhaps worse performance can be expected if looking up multiple users.
Special:Log

This is simpler, but it still sets a relevant user, in the skin, so tool links are added for seeing things about that one user. There's also a similar issue with the query performance once we have multiple performers.

Possible approach

It would be simplest to keep the single target concept, and still have a "main" target that could be used in the tool links etc.

We could start by implementing the most minimal version of this, only supporting seeing related temp account contributions. A user could have something like a checkbox to add in the related temporary users. We wouldn't allow arbitrary multiple-target searches.

Later, if there was enough interest to prioritize it, perhaps we could introduce an additional form field for adding additional targets, which would accept multiple users, up to some limit.

How to add in related temporary account contributions
  • We could handle the SpecialContributions__getForm__filters hook from CheckUser to add a form field for showing contributions from related temp accounts, if the target is a temp account.
  • We could handle the ContribsPager__getQueryInfo hook to add in the related temporary accounts from CheckUser. We could find them using the new CheckUserTemporaryAccountsByIPLookup::getActiveTempAccountNames method.

Re: query performance, see also T415703, which uses the same query.

Investigating whether we need to limit the number of related temp users looked up

I believe we do not need to add an extra limit, on top of the limit already applied in CheckUserTemporaryAccountsByIPLookup::getActiveTempAccounts.

The query will need to do a sort if there is more than one user to look up, and there will be more rows to sort through if either user has lots of edits, so a query for 2 prolific users' revisions could be much slower than a query for 100 un-prolific users' revisions. In other words, we can't control the query's performance by limiting the number of users any further than ::getActiveTempAccountNames already does.

I looked at whether we are likely to see slow queries in real life, and I think not. On enwiki:

  • The maximum number of edits shared by any group of temporary users that have shared IPs is c.3000
  • Some of the groups with the highest number of edits have around 100 temporary users, other have less than 5
  • I tried performing the Special:Contributions query for the groups of users with the most edits:
    • A filesort was needed, but only over a couple of hundred rows (I am not sure exactly why)
    • The queries were quick in practice (<<0.5 sec)