Page MenuHomePhabricator

The "show ip" action should also provide a distinct list of user-agents for each IP
Closed, DeclinedPublic

Assigned To
Authored By
Huji
Jul 13 2017, 1:12 AM
Referenced Files
F26890719: Current.PNG
Oct 27 2018, 2:40 PM
F26891294: final.PNG
Oct 27 2018, 2:40 PM
F26890798: net.PNG
Oct 27 2018, 2:40 PM

Description

Currently, the show IP function lists the IPs along with time periods used. In addition, it should show a unique list of user-agents used with that IP. This should be trivial as far as DB query goes, as we are already retrieving the IPs using an index, and there are typically not many user-agents per IP-user pair.

Current state:

Current.PNG (100×615 px, 8 KB)

Next state (this task):

net.PNG (188×625 px, 16 KB)

Desired final state:

final.PNG (187×836 px, 20 KB)

Event Timeline

Aklapper renamed this task from The "show ip" aciton should also provide a distinct list of user-agents for each IP to The "show ip" action should also provide a distinct list of user-agents for each IP.Jul 13 2017, 8:12 AM
Huji triaged this task as Low priority.

Change 365065 had a related patch set uploaded (by Huji; owner: Huji):
[mediawiki/extensions/CheckUser@master] List distinct user-agents per IP period in the Show IPs function

https://gerrit.wikimedia.org/r/365065

Adding DBA so they can review the query and approve of it. I think the use cases are infrequent enough, and the existing index cuc_user_ip_time is narrowing the search space enough that no additional index creation would be necessary.

See comment on gerrit, it helps with speeding up reviews :-).

@jcrespo I have responded to your request there, and just resolved the merge conflict too. Can you please take a look?

Hey @Huji - we, DBAs, are a bit overwhelmed lately with lots of unexpected fires and requests, so it might take sometime until we can have a proper look at this.
This doesn't mean will not take a look, but I am basically setting some expectations :-)
If this is really really super urgent let us know so we can try to see if we can make some room for this.

Thanks and sorry for the inconveniences.

This is not super urgent; we had forgotten to add DBA to it at the get go, so we are at fault. Get to it as you can.

Huji raised the priority of this task from Low to High.Oct 27 2018, 2:11 PM

I am increasing the priority for this, because in some cases this functionality can truly impact the process and outcome of a CU. (For those who have access, see T207232 as an example)

Not only we should show a distinct list of UAs, but we also should show how many distinct users and IPs are using each UA. I am going to add some mock-ups to this task shortly.

@jcrespo I think it is best to move directly to the final desired state (see updated task definition above). This means we would want to run a bunch of DISTINCT queries, and from a DBA perspective, this may mean we need a few new indexes.

On gerrit for the old patch (from February) you had asked me to give you some extreme cases to test on production. How can I do that? I need a bit guidance on how to move this task forward.

Please note that while I have been asking you to wait, I am genuinely concerned about the performance of the query- even if only a few people can run it, group by over long period of times can be bad for the servers- so not blocked just on me not responding - however it is not easy to provide alternatives. Putting a limit on the timespan, or testing the waters by trying to count the number of matching entries beforehand would help to build confidence on the query- note that even if you do a LIMIT 5001, the group by forces to go over all matching entries.
What I was asking was to provide pure SQL and a tip such as "run this on the ip with more edits" or so (it is not easy for me to do the mw syntax -> SQL transformation, I am not that familiar with mediawiki code, sorry.

I understand it better now. Let me update the patch and provide you with some queries.

I am going to decline this Task, per the analysis done in T212092

An alternative strategy using APIs is discussed in T212092#4933324 and might be worth pursuing. That won't need any code change in CheckUser at this time, so would not need for this task to be open.

Change 365065 abandoned by Dreamy Jazz:

[mediawiki/extensions/CheckUser@master] List distinct user-agents per IP period in the Show IPs function

Reason:

Associated task is closed as declined and the code base has changed so much that this patch has way too many merge conflicts. If this is still desired, I would suggest making a new patch that modifies the CheckUserGetIPsPager.php file.

https://gerrit.wikimedia.org/r/365065