Page MenuHomePhabricator

DBQueryTimeoutError on Wikimedia Login's Special:CheckUser
Closed, ResolvedPublicPRODUCTION ERROR

Description

Hello,

I'm facing this error many times when I check single IP by Special:CheckUser in Wikimedia Login

MediaWiki internal error.

Original exception: [XCft0QpAAD0AAH4jjkMAAAAA] 2018-12-29 21:58:37: Fatal exception of type "Wikimedia\Rdbms\DBQueryTimeoutError"

Exception caught inside exception handler.

Set $wgShowExceptionDetails = true; at the bottom of LocalSettings.php to show detailed debugging information.

Event Timeline

alaa renamed this task from MediaWiki internal error during use Special:CheckUser in Wikimedia Login to DBQueryTimeoutError on Wikimedia Login's Special:CheckUser .Dec 29 2018, 10:03 PM
alaa updated the task description. (Show Details)
exception.file	       	/srv/mediawiki/php-1.33.0-wmf.9/includes/libs/rdbms/database/Database.php:1504
exception.message	       	A database query timeout has occurred. 
Query: SELECT  log_timestamp  FROM `logging`    WHERE log_user_text = 'xxx.xxx.xxx.xxx'  ORDER BY log_timestamp DESC LIMIT 1  
Function: SpecialCheckUser::noMatchesMessage
Error: 2062 Read timeout is reached (10.64.32.136)
exception.trace	       	
#0 /srv/mediawiki/php-1.33.0-wmf.9/includes/libs/rdbms/database/Database.php(1476): Wikimedia\Rdbms\Database->makeQueryException(string, integer, string, string)
#1 /srv/mediawiki/php-1.33.0-wmf.9/includes/libs/rdbms/database/Database.php(1236): Wikimedia\Rdbms\Database->reportQueryError(string, integer, string, string, boolean)
#2 /srv/mediawiki/php-1.33.0-wmf.9/includes/libs/rdbms/database/Database.php(1703): Wikimedia\Rdbms\Database->query(string, string)
#3 /srv/mediawiki/php-1.33.0-wmf.9/includes/libs/rdbms/database/Database.php(1528): Wikimedia\Rdbms\Database->select(array, string, string, string, array, array)
#4 /srv/mediawiki/php-1.33.0-wmf.9/extensions/CheckUser/includes/specials/SpecialCheckUser.php(440): Wikimedia\Rdbms\Database->selectField(array, string, string, string, array, array)
#5 /srv/mediawiki/php-1.33.0-wmf.9/extensions/CheckUser/includes/specials/SpecialCheckUser.php(1058): SpecialCheckUser->noMatchesMessage(string, boolean)
#6 /srv/mediawiki/php-1.33.0-wmf.9/extensions/CheckUser/includes/specials/SpecialCheckUser.php(104): SpecialCheckUser->doIPUsersRequest(string, boolean, string, integer, string, string)
#7 /srv/mediawiki/php-1.33.0-wmf.9/includes/specialpage/SpecialPage.php(569): SpecialCheckUser->execute(NULL)
#8 /srv/mediawiki/php-1.33.0-wmf.9/includes/specialpage/SpecialPageFactory.php(558): SpecialPage->run(NULL)
#9 /srv/mediawiki/php-1.33.0-wmf.9/includes/MediaWiki.php(288): MediaWiki\Special\SpecialPageFactory->executePath(Title, RequestContext)
#10 /srv/mediawiki/php-1.33.0-wmf.9/includes/MediaWiki.php(862): MediaWiki->performRequest()
#11 /srv/mediawiki/php-1.33.0-wmf.9/includes/MediaWiki.php(517): MediaWiki->main()
#12 /srv/mediawiki/php-1.33.0-wmf.9/index.php(42): MediaWiki->run()
#13 /srv/mediawiki/w/index.php(3): include(string)
#14 {main}
exception.message	       	A database query timeout has occurred.

That's sad. Is it true that loginwiki generally only contains CU logs for user logins via CentralAuth? If so, then it would its CU data has less repeated values among its rows compared to a wiki in which users actually edit. And that possibly means that indexes are less effective on login wiki. Should we add the DBA tag to this Task too?

I don't know if this could be another case of T71127
I haven't checked the query or the query plan.

So it is indeed T71127

db1078 (the DB that was giving the timeout as per logtash) has:

PRIMARY KEY (`log_id`),
KEY `type_time` (`log_type`,`log_timestamp`),
KEY `user_time` (`log_user`,`log_timestamp`),
KEY `page_time` (`log_namespace`,`log_title`,`log_timestamp`),
KEY `times` (`log_timestamp`),
KEY `log_user_type_time` (`log_user`,`log_type`,`log_timestamp`),
KEY `log_page_id_time` (`log_page`,`log_timestamp`),
KEY `type_action` (`log_type`,`log_action`,`log_timestamp`),
KEY `actor_time` (`log_actor`,`log_timestamp`),
KEY `log_actor_type_time` (`log_actor`,`log_type`,`log_timestamp`)

And this is the explain:

mysql:root@localhost [(none)]> show explain for 1669050582;
+------+-------------+---------+-------+---------------+-------+---------+------+----------+-----------------------------+
| id   | select_type | table   | type  | possible_keys | key   | key_len | ref  | rows     | Extra                       |
+------+-------------+---------+-------+---------------+-------+---------+------+----------+-----------------------------+
|    1 | SIMPLE      | logging | index | NULL          | times | 14      | NULL | 19844188 | Using where; Using filesort |
+------+-------------+---------+-------+---------------+-------+---------+------+----------+-----------------------------+
1 row in set, 1 warning (0.00 sec)

There is no index on log_user_text whereas on db1077 the query runs in less than a second cause it has this index:

KEY `log_user_text_time` (`log_user_text`(16),`log_timestamp`),

I will alter this host to add this index (as it is part of tables.sql) so this error can go away.

Change 481820 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-eqiad.php: Depool db1078

https://gerrit.wikimedia.org/r/481820

Change 481820 merged by jenkins-bot:
[operations/mediawiki-config@master] db-eqiad.php: Depool db1078

https://gerrit.wikimedia.org/r/481820

Mentioned in SAL (#wikimedia-operations) [2019-01-02T07:30:46Z] <marostegui@deploy1001> Synchronized wmf-config/db-eqiad.php: Depool db1078 - T212692 (duration: 00m 47s)

Mentioned in SAL (#wikimedia-operations) [2019-01-02T07:30:49Z] <marostegui> Fix login.logging table on db1078 - T212692

Mentioned in SAL (#wikimedia-operations) [2019-01-02T07:38:24Z] <marostegui@deploy1001> Synchronized wmf-config/db-eqiad.php: Repool db1078 - T212692 (duration: 00m 46s)

Marostegui claimed this task.
Marostegui added a project: DBA.
This comment was removed by Marostegui.

After adding the index:

root@db1078.eqiad.wmnet[loginwiki]> SELECT  log_timestamp  FROM `logging`    WHERE log_user_text = 'xx'  ORDER BY log_timestamp DESC LIMIT 1;
Empty set (0.00 sec)

root@db1078.eqiad.wmnet[loginwiki]> explain SELECT  log_timestamp  FROM `logging`    WHERE log_user_text = 'xx'  ORDER BY log_timestamp DESC LIMIT 1;
+------+-------------+---------+------+--------------------+--------------------+---------+-------+------+-------------+
| id   | select_type | table   | type | possible_keys      | key                | key_len | ref   | rows | Extra       |
+------+-------------+---------+------+--------------------+--------------------+---------+-------+------+-------------+
|    1 | SIMPLE      | logging | ref  | log_user_text_time | log_user_text_time | 18      | const |    1 | Using where |
+------+-------------+---------+------+--------------------+--------------------+---------+-------+------+-------------+
1 row in set (0.03 sec)
mmodell changed the subtype of this task from "Task" to "Production Error".Aug 28 2019, 11:07 PM