Page MenuHomePhabricator

Queries to cu_log_event on large wikis are exceeding TransactionProfiler limits on Special:CheckUser
Closed, ResolvedPublic1 Estimated Story Points

Description

Queries to the cu_log_event table on large wikis are taking too long and cause TransactionProfiler warnings. Based on experimentation on running the affected queries, I've found that this is due to the JOIN to the logging table.

I've found two solutions to this problem:

  1. Split the queries into two separate queries
  2. Update the JOIN conditions to use the cule_actor column instead of the log_actor column
Steps to reproduce
  1. Open Special:CheckUser on a wiki with many log actions (i.e. a large logging table)
  2. Run a check on a user which has made at least a few log actions.
Logstash details
message
Expectation (readQueryTime <= 5) by MediaWiki::main not met (actual: {actualSeconds}) in trx #{trxId}:
{query}
trace
from /srv/mediawiki/php-1.42.0-wmf.12/includes/libs/rdbms/TransactionProfiler.php(525)
#0 /srv/mediawiki/php-1.42.0-wmf.12/includes/libs/rdbms/TransactionProfiler.php(334): Wikimedia\Rdbms\TransactionProfiler->reportExpectationViolated(string, Wikimedia\Rdbms\GeneralizedSql, double, string, string)
#1 /srv/mediawiki/php-1.42.0-wmf.12/includes/libs/rdbms/database/TransactionManager.php(615): Wikimedia\Rdbms\TransactionProfiler->recordQueryCompletion(Wikimedia\Rdbms\GeneralizedSql, double, boolean, integer, string, string)
#2 /srv/mediawiki/php-1.42.0-wmf.12/includes/libs/rdbms/database/Database.php(825): Wikimedia\Rdbms\TransactionManager->recordQueryCompletion(Wikimedia\Rdbms\GeneralizedSql, double, boolean, integer, string)
#3 /srv/mediawiki/php-1.42.0-wmf.12/includes/libs/rdbms/database/Database.php(713): Wikimedia\Rdbms\Database->attemptQuery(Wikimedia\Rdbms\Query, string, boolean)
#4 /srv/mediawiki/php-1.42.0-wmf.12/includes/libs/rdbms/database/Database.php(640): Wikimedia\Rdbms\Database->executeQuery(Wikimedia\Rdbms\Query, string, integer)
#5 /srv/mediawiki/php-1.42.0-wmf.12/includes/libs/rdbms/database/Database.php(1347): Wikimedia\Rdbms\Database->query(Wikimedia\Rdbms\Query, string)
#6 /srv/mediawiki/php-1.42.0-wmf.12/includes/libs/rdbms/database/DBConnRef.php(119): Wikimedia\Rdbms\Database->select(array, array, array, string, array, array)
#7 /srv/mediawiki/php-1.42.0-wmf.12/includes/libs/rdbms/database/DBConnRef.php(351): Wikimedia\Rdbms\DBConnRef->__call(string, array)
#8 /srv/mediawiki/php-1.42.0-wmf.12/extensions/CheckUser/src/CheckUser/Pagers/AbstractCheckUserPager.php(770): Wikimedia\Rdbms\DBConnRef->select(array, array, array, string, array, array)
#9 /srv/mediawiki/php-1.42.0-wmf.12/includes/pager/IndexPager.php(289): MediaWiki\CheckUser\CheckUser\Pagers\AbstractCheckUserPager->reallyDoQuery(string, integer, boolean)
#10 /srv/mediawiki/php-1.42.0-wmf.12/includes/pager/IndexPager.php(571): MediaWiki\Pager\IndexPager->doQuery()
#11 /srv/mediawiki/php-1.42.0-wmf.12/extensions/CheckUser/src/CheckUser/SpecialCheckUser.php(378): MediaWiki\Pager\IndexPager->getBody()
#12 /srv/mediawiki/php-1.42.0-wmf.12/includes/specialpage/SpecialPage.php(727): MediaWiki\CheckUser\CheckUser\SpecialCheckUser->execute(NULL)
#13 /srv/mediawiki/php-1.42.0-wmf.12/includes/specialpage/SpecialPageFactory.php(1652): MediaWiki\SpecialPage\SpecialPage->run(NULL)
#14 /srv/mediawiki/php-1.42.0-wmf.12/includes/MediaWiki.php(359): MediaWiki\SpecialPage\SpecialPageFactory->executePath(string, RequestContext)
#15 /srv/mediawiki/php-1.42.0-wmf.12/includes/MediaWiki.php(954): MediaWiki->performRequest()
#16 /srv/mediawiki/php-1.42.0-wmf.12/includes/MediaWiki.php(607): MediaWiki->main()
#17 /srv/mediawiki/php-1.42.0-wmf.12/index.php(50): MediaWiki->run()
#18 /srv/mediawiki/php-1.42.0-wmf.12/index.php(46): wfIndexMain()
#19 /srv/mediawiki/w/index.php(3): require(string)
#20 {main}

Event Timeline

The query linked in the logstash link took 18 seconds to run on mwmaint2002.

Suggested queries which both take 0.001 seconds:

First query to be run that will be combined with the other results
SELECT  cule_timestamp AS `timestamp`,cule_ip AS `ip`,cule_ip_hex AS `ip_hex`,cule_xff AS `xff`,cule_xff_hex AS `xff_hex`,cule_agent AS `agent`,actor_user AS `user`,actor_name AS `user_text`,1 AS `client_hints_reference_type`  FROM `cu_log_event` FORCE INDEX (cule_actor_ip_time) JOIN `actor` `actor_log_actor` ON ((actor_log_actor.actor_id=cule_actor)) WHERE actor_user = <actor_id> ORDER BY timestamp DESC LIMIT 5001;
Second query to be run after the main queries to look up the data for the log IDs found.
SELECT log_title AS `title`,log_page AS `page_id`,log_namespace AS `namespace`,comment_text,comment_data,log_type,log_action,log_params,log_deleted,log_id AS `log_id` FROM `logging` JOIN `comment` `comment_log_comment` ON ((comment_log_comment.comment_id = log_comment_id)) WHERE log_id IN (<log id>, <log id>, <log id>);
Dreamy_Jazz changed the point value for this task from 2 to 1.

Having looked at this further, I've found an easier solution.

It seems that the query uses the join condition actor_log_actor.actor_id=log_actor. Converting this to instead be actor_log_actor.actor_id=cule_actor makes the query run much faster and has no noticeable difference in speed compared to using two queries.

Change 1010954 had a related patch set uploaded (by Dreamy Jazz; author: Dreamy Jazz):

[mediawiki/extensions/CheckUser@master] Use cule_actor instead of log_actor for join to the logging table

https://gerrit.wikimedia.org/r/1010954

For this ticket I would suggest verifying no changes in the data returned by Special:CheckUser when running a Get actions check is observed. This is because the change for this ticket should have no user-visible changes.

As for the problem of the query being slow, this will be verified by checking logstash entries have stopped once this change is deployed to all wikis.

Change 1010954 merged by jenkins-bot:

[mediawiki/extensions/CheckUser@master] Use cule_actor instead of log_actor for join to the logging table

https://gerrit.wikimedia.org/r/1010954

For this ticket I would suggest verifying no changes in the data returned by Special:CheckUser when running a Get actions check is observed. This is because the change for this ticket should have no user-visible changes.

@Dreamy_Jazz The only difference I have seen so far is that IPs appear in "See results in table". This might have been a deliberate change.

For this ticket I would suggest verifying no changes in the data returned by Special:CheckUser when running a Get actions check is observed. This is because the change for this ticket should have no user-visible changes.

@Dreamy_Jazz The only difference I have seen so far is that IPs appear in "See results in table". This might have been a deliberate change.

IP addresses should have already shown in that table, so I guess that is an okay difference. My guess is that the difference could have been caused by T359962.

IP addresses should have already shown in that table, so I guess that is an okay difference. My guess is that the difference could have been caused by T359962.

Thanks.

Otherwise, I have run various Special:CheckUser queries before and after this change. I have seen no differences.

Test environment: local docker CheckUser 2.5 (3c38f9a) 07:19, 26 March 2024.

Events in logstash for the slow query on cu_log_event have stopped on wmf.23, so this seems to be resolved.