Page MenuHomePhabricator

The "show ip" action should also provide a distinct list of user-agents for each IP
Closed, DeclinedPublic

Description

Currently, the show IP function lists the IPs along with time periods used. In addition, it should show a unique list of user-agents used with that IP. This should be trivial as far as DB query goes, as we are already retrieving the IPs using an index, and there are typically not many user-agents per IP-user pair.

Current state:

Next state (this task):

Desired final state:

Event Timeline

Huji created this task.Jul 13 2017, 1:12 AM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJul 13 2017, 1:12 AM
Aklapper renamed this task from The "show ip" aciton should also provide a distinct list of user-agents for each IP to The "show ip" action should also provide a distinct list of user-agents for each IP.Jul 13 2017, 8:12 AM
Huji triaged this task as Low priority.Jul 13 2017, 5:33 PM
Huji claimed this task.

Change 365065 had a related patch set uploaded (by Huji; owner: Huji):
[mediawiki/extensions/CheckUser@master] List distinct user-agents per IP period in the Show IPs function

https://gerrit.wikimedia.org/r/365065

Huji added a project: DBA.Jul 19 2017, 3:43 AM

Adding DBA so they can review the query and approve of it. I think the use cases are infrequent enough, and the existing index cuc_user_ip_time is narrowing the search space enough that no additional index creation would be necessary.

Huji moved this task from Backlog to Patches in review on the CheckUser board.Aug 28 2017, 12:29 PM

See comment on gerrit, it helps with speeding up reviews :-).

jcrespo moved this task from Triage to Backlog on the DBA board.Sep 12 2017, 4:26 PM
MarcoAurelio moved this task from Closed to Patches in review on the CheckUser board.
Huji added a comment.Feb 25 2018, 3:56 AM

@jcrespo I have responded to your request there, and just resolved the merge conflict too. Can you please take a look?

Hey @Huji - we, DBAs, are a bit overwhelmed lately with lots of unexpected fires and requests, so it might take sometime until we can have a proper look at this.
This doesn't mean will not take a look, but I am basically setting some expectations :-)
If this is really really super urgent let us know so we can try to see if we can make some room for this.

Thanks and sorry for the inconveniences.

Huji added a comment.Feb 25 2018, 3:02 PM

This is not super urgent; we had forgotten to add DBA to it at the get go, so we are at fault. Get to it as you can.

Huji raised the priority of this task from Low to High.Oct 27 2018, 2:11 PM

I am increasing the priority for this, because in some cases this functionality can truly impact the process and outcome of a CU. (For those who have access, see T207232 as an example)

Not only we should show a distinct list of UAs, but we also should show how many distinct users and IPs are using each UA. I am going to add some mock-ups to this task shortly.

Huji updated the task description. (Show Details)Oct 27 2018, 2:40 PM
Huji added a comment.Oct 27 2018, 2:42 PM

@jcrespo I think it is best to move directly to the final desired state (see updated task definition above). This means we would want to run a bunch of DISTINCT queries, and from a DBA perspective, this may mean we need a few new indexes.

On gerrit for the old patch (from February) you had asked me to give you some extreme cases to test on production. How can I do that? I need a bit guidance on how to move this task forward.

Please note that while I have been asking you to wait, I am genuinely concerned about the performance of the query- even if only a few people can run it, group by over long period of times can be bad for the servers- so not blocked just on me not responding - however it is not easy to provide alternatives. Putting a limit on the timespan, or testing the waters by trying to count the number of matching entries beforehand would help to build confidence on the query- note that even if you do a LIMIT 5001, the group by forces to go over all matching entries.
What I was asking was to provide pure SQL and a tip such as "run this on the ip with more edits" or so (it is not easy for me to do the mw syntax -> SQL transformation, I am not that familiar with mediawiki code, sorry.

Huji added a comment.Oct 30 2018, 12:22 AM

I understand it better now. Let me update the patch and provide you with some queries.

0x010C added a subscriber: 0x010C.Oct 31 2018, 5:22 PM
Huji closed this task as Declined.

I am going to decline this Task, per the analysis done in T212092

An alternative strategy using APIs is discussed in T212092#4933324 and might be worth pursuing. That won't need any code change in CheckUser at this time, so would not need for this task to be open.