Page MenuHomePhabricator

More versatile searching in CheckUser log
Closed, ResolvedPublicFeature

Description

Author: mike.lifeguard+bugs

Description:
The current search function can search only for target and initiator, and cannot accept wildcards. Please allow wildcard or regex searching, and allow searching by wiki (once bug 13789 is done), and allow searching in the reason.

Event Timeline

bzimport raised the priority of this task from to Low.Nov 21 2014, 10:13 PM
bzimport set Reference to bz14699.
bzimport added a subscriber: Unknown Object (MLST).

Hmm, a comment,time index will be needed here

Actually, at least two other indexes would be needed too, since the current ones are on user ID's. This is starting to become a bit much...

Bah, and the comment index needs to be fulltext, which needs a separate non-innodb table.

Closing this as it would be *too* much of a pain for not enough benefit.

mike.lifeguard+bugs wrote:

Are we sure we can't get a half-decent search on the target at least? That'd be really useful for cases where one doesn't remember the exact name.

Even if we don't get all the searchability we want, we could do with a bit more :)

In order to find recent concerns about a long time sock puppeter, the search form is useless at the moment. This is necessary when doing audit work, and also considering ban appeals.

The only solution right now is to:
a) ask all checkusers if they have done recent related checks, or
b) load a log of the last year and search.

Can you be more specific about the proposed use cases we don't cover currently?

How should a "ban appeal search" look like? How do "audit work" searches look like?

This might help to provide a faster and easier solution to most frequent problems.

Aklapper changed the subtype of this task from "Task" to "Feature Request".Feb 4 2022, 11:01 AM

Change 800827 had a related patch set uploaded (by Dreamy Jazz; author: Dreamy Jazz):

[mediawiki/extensions/CheckUser@master] Allow CheckUserLog searching by cul_reason with wildcards

https://gerrit.wikimedia.org/r/800827

Change 800827 merged by jenkins-bot:

[mediawiki/extensions/CheckUser@master] Allow searching CheckUserLog by check reason with wildcards

https://gerrit.wikimedia.org/r/800827

(not sure if I'm right in adding Schema-change-in-production to this, but rather safe than sorry..)

Change 803260 had a related patch set uploaded (by Dreamy Jazz; author: Dreamy Jazz):

[mediawiki/extensions/CheckUser@master] Add a temporary config var for writing reasons to the comment table

https://gerrit.wikimedia.org/r/803260

Change 802951 had a related patch set uploaded (by Urbanecm; author: Urbanecm):

[mediawiki/extensions/CheckUser@master] Revert "Allow searching CheckUserLog by check reason with wildcards"

https://gerrit.wikimedia.org/r/802951

Change 802951 abandoned by Urbanecm:

[mediawiki/extensions/CheckUser@master] Revert "Allow searching CheckUserLog by check reason with wildcards"

Reason:

my mistake, shouldn't break things immediately (it checks for field existence).

https://gerrit.wikimedia.org/r/802951

Change 802951 restored by Urbanecm:

[mediawiki/extensions/CheckUser@master] Revert "Allow searching CheckUserLog by check reason with wildcards"

https://gerrit.wikimedia.org/r/802951

Change 803260 abandoned by Dreamy Jazz:

[mediawiki/extensions/CheckUser@master] Add a temporary config var for writing reasons to the comment table

Reason:

Entire patch was reverted for this train so no need for this right now.

https://gerrit.wikimedia.org/r/803260

Change 802951 merged by jenkins-bot:

[mediawiki/extensions/CheckUser@master] Revert "Allow searching CheckUserLog by check reason with wildcards"

https://gerrit.wikimedia.org/r/802951

Change 803298 had a related patch set uploaded (by Dreamy Jazz; author: Dreamy Jazz):

[mediawiki/extensions/CheckUser@master] Add cul_reason_id and cul_reason_plaintext_id into cu_log

https://gerrit.wikimedia.org/r/803298

The above patch adds the two columns that the since reverted patch depends on to work.

If you want to make this happen I won't block you but the better way to generally do this is to actually make it use the core's logging system. CU is fully reinventing the wheel here because back then core didn't have private logging system but now it does and OS uses that. Migrating data and etc. will be fun and I'm not sure it'll allow this feature but it would fix a lot of issues.

There are a number of issues that I could see relying on cores logging system that would probably need to be overcome:

  • The cu_log table stores the range so that searching can be done by range and not individually. If I am not wrong the core logging system does now allow this. Removing the ability to search logs by range will be a deal breaker for pretty much every CU. I certainly use range searches frequently. This means searching logs by range, for at least the checkuser log type, would be needed in the core logging system.
  • The cu_log entries do not have a timestamp link like log entries in Special:Log do, and with T309925 this link would not be the same as what Special:Log does.
  • Searching by reason has been requested and would require a change to the core logging system to implement. For checkusers the benefit of searching by reason is that it allows finding previous checks for a case on socks that may not be listed in easy to find places (such as enwiki's SPI).

I don't feel I have the knowledge of MediaWiki to do this merge and so unless there is someone who will work on this then it will I guess then just delay any changes to the CU extension that require a change to the cu_log table?

An assumption I've made about the DB is that it can recognise that the hash is going to be the same for all rows with the same comment. In theory, if the DB does not do this already, the searching could be simplified such that the hash of the search reason could be calculated and then instead of searching for matches in the text the DB could instead search for matches of the hash as it has an index in the comment table. This wouldn't work for wildcard searching but if a JOIN with a equals to the text value is an expensive operation this could speed things up.

  • The request for range support is T146628, but mostly for the performer, maybe also for the target.
  • Wildcards for the whole comment would include % at begin of the LIKE statement, which are slow and sometimes problematic on production. If the possible search terms could be written with more detail, possible other technical ways are easier to find. For example the log_search table is used to find suppressed log entries for actors.

Change 803298 merged by jenkins-bot:

[mediawiki/extensions/CheckUser@master] Add cul_reason_id and cul_reason_plaintext_id into cu_log

https://gerrit.wikimedia.org/r/803298

Change 879883 had a related patch set uploaded (by Dreamy Jazz; author: Dreamy Jazz):

[mediawiki/extensions/CheckUser@master] Add reason search in CU log and move plaintext gen. code to a service

https://gerrit.wikimedia.org/r/879883

Change 879883 merged by jenkins-bot:

[mediawiki/extensions/CheckUser@master] Add reason search in CU log and move plaintext gen. code to a service

https://gerrit.wikimedia.org/r/879883

I'm going to close this as resolved and split any remaining tasks into other tasks as this has become a bit of a catch all for improvements to the CheckUserLog.

Searching by reason has been implemented for direct matches. Wildcard search will take a fair bit more work that may require another table.

With respect to the other things:

  • Searching by initiator or target has been implemented in T266586
  • Searching by wiki is not going to happen. Global CheckUser has been rejected by WMF wikis but this feature could be added into the existing GlobalCheckUser tool.
  • Searching by wildcard / regex will be a complicated implementation. Regex searching will be slow and probably not something that can be implemented.
    • I don't see a particular need to implement wildcard or regex searching to the initiator
    • While I can see some use for a wildcard or regex search for the target, I'm not sure I see the need for it myself enough to work on this. To achieve this there would have to be substantial work in the database for this to scale to big wikis.
    • Wildcard searching for the reason has more of a usecase, and I will file a separate ticket for this as it will require a fair amount of work. I don't think regex searching would scale well on big wikis so this is something I won't be looking into.