Page MenuHomePhabricator

Search bar for check rationale in CheckUser log
Closed, DeclinedPublic

Description

Currently, you can search by target and initiator in Special:CheckUserLog, but not by a string in the check rationale. Ideally, one would also be able to search by check rationale either to find related checks (e.g. search for a particular SPI to see every check ever run related to that SPI) or for audit purposes.

Could this feature be added? If not, is there a way to get the results for a particular search using the API?

Event Timeline

Keyword searching is generally very slow and cannot be made efficient. In technical terms, it is not possible to index a database column that contains strings (like the one that stores the check reasons) in such a way that you can efficiently look for a word appearing in the middle of the string.

That means any search on the field is going to require every single row of the cu_log table to be looked at by the database engine. Depending on the size of that table, this can take a long time. Since use cases of this are very limited (we don't normally look for check logs based on their reason string), I think the cost-benefit trade-off is highly against implementing what you asked here.

@jcrespo can I ask you a quick favor? Can you just tell us, ballpark, how big the cu_log table is for some of the larger wikis such as enwiki or dewiki? If it is anything more than 1-2 MB, I am inclined to mark this task as Declined.

This is probably a dumb question, but how do we manage it for searching the text of entire articles on Wikipedia in the search bar in the upper right-hand corner if it's so expensive to have any sort of string-searching feature?

If the issue is searching the entire cu_log, then such a feature could theoretically just limit itself to searching, at most, within a three month period or something like that.

If it's not possible to implement this feature even with such throttles to prevent particularly expensive searches, my question would change to "Who can I talk to with access to the database that can query the relevant data for me for one-off instances where it is needed?"

@jcrespo can I ask you a quick favor? Can you just tell us, ballpark, how big the cu_log table is for some of the larger wikis such as enwiki or dewiki? If it is anything more than 1-2 MB, I am inclined to mark this task as Declined.

Sorry, I cannot do that, I've been told in the past not to reveal the size or edit rate of tables due to privacy concerns, and not to expose it in public. I will need security/legal ok for doing that.

@jcrespo understood.

@BU_Rob13 we don't simply use database tables for that; we utilize a more sophisticated system called CirusSearch that is based on Elasticsearch. Setting it up takes a good deal of work; that is why I said the "cost-benefit trade-off" is favorable here. Article searches are a must-have and very commonly used, but searching the log texts is a good-to-have and is rarely used.

I am going to be bold and say that this is NOT something we would like to implement into CheckUser codebase itself. You can still use the API to retrieve all of the logs (in batches of 100, for instance) and swift through them, perhaps using a custom JavaScript tool.