Page MenuHomePhabricator

Unable to open Special:Log with user query (fatal timeout)
Closed, ResolvedPublicPRODUCTION ERROR

Description

A database query error has occurred. This may indicate a bug in the software.[WhHsMApAAD0AAFkrelgAAAAA] 2017-11-19 20:41:16: Fatal exception of type "Wikimedia\Rdbms\DBQueryTimeoutError"

Spotted trying to query https://meta.wikimedia.org/wiki/Special:Log/Maintenance_script

Event Timeline

This error was caused by a slow query against the log table.

Slow query to find the maintenance script contributions
SELECT  log_id,log_type,log_action,log_timestamp,log_user,log_user_text,log_namespace,log_title,log_params,log_deleted,user_id,user_name,user_editcount,log_comment AS `log_comment_text`,NULL AS `log_comment_data`,NULL AS `log_comment_cid`,(SELECT  GROUP_CONCAT(ct_tag SEPARATOR ',')  FROM `change_tag`    WHERE ct_log_id=log_id  ) AS `ts_tags`  FROM `logging` LEFT JOIN `user` ON ((log_user=user_id))   WHERE (log_type NOT IN ('spamblacklist','titleblacklist','suppress')) AND log_user = '7880087' AND (log_type != 'thanks') AND (log_type != 'patrol') AND (log_type != 'tag')  ORDER BY log_timestamp DESC LIMIT 51

If this was a one time DB error with not much impact we can close this task with @Dereckson findings. Not sure though, what's the process here. Thanks.

Maybe T71713. There are also other problems without index (T68961, T51199)

This really shouldn't be that slow... there is an index of (log_user, log_timestamp) and there should be very few entries filtered via the log_type restriction as maintenance scripts generally don't make tag, patrol, thanks, suppress, titleblacklist or spamblacklist log entries.

explain of query
mysql:wikiadmin@db1086 [metawiki]> explain SELECT  log_id,log_type,log_action,log_timestamp,log_user,log_user_text,log_namespace,log_title,log_params,log_deleted,user_id,user_name,user_editcount,log_comment AS `log_comment_text`,NULL AS `log_comment_data`,NULL AS `log_comment_cid`,(SELECT  GROUP_CONCAT(ct_tag SEPARATOR ',')  FROM `change_tag`    WHERE ct_log_id=log_id  ) AS `ts_tags`  FROM `logging` LEFT JOIN `user` ON ((log_user=user_id))   WHERE (log_type NOT IN ('spamblacklist','titleblacklist','suppress')) AND log_user = '7880087' AND (log_type != 'thanks') AND (log_type != 'patrol') AND (log_type != 'tag')  ORDER BY log_timestamp DESC LIMIT 51\G
*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: logging
         type: index
possible_keys: type_time,user_time,type_action,log_user_type_time
          key: times
      key_len: 16
          ref: NULL
         rows: 9718
        Extra: Using where
*************************** 2. row ***************************
           id: 1
  select_type: PRIMARY
        table: user
         type: eq_ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: metawiki.logging.log_user
         rows: 1
        Extra: 
*************************** 3. row ***************************
           id: 2
  select_type: DEPENDENT SUBQUERY
        table: change_tag
         type: ref
possible_keys: change_tag_log_tag
          key: change_tag_log_tag
      key_len: 5
          ref: metawiki.logging.log_id
         rows: 10
        Extra: Using where; Using index
3 rows in set (0.02 sec)

mysql:wikiadmin@db1086 [metawiki]>

So its deciding to use the times (log_timestamp) index instead of user_time (log_user, log_timestamp) index, which is definitely the wrong choice. (This explain was from db1086. MediaWiki would use a special slave that has the logging table partitioned, so it might have a different explain)

Bawolff renamed this task from [WhHsMApAAD0AAFkrelgAAAAA] 2017-11-19 20:41:16: Fatal exception of type "Wikimedia\Rdbms\DBQueryTimeoutError" to user filtering of Special:Log is slow. Possibly using `times` index instead of `user_times` index.Nov 20 2017, 9:17 PM
Incnis_Mrsi subscribed.

Possibly related.
[WngdQApAME8AABNjd-cAAACR] 2018-02-05 09:01:48: Неустранимое исключение типа «Wikimedia\Rdbms\DBQueryTimeoutError» on https://commons.wikimedia.org/w/index.php?title=Special:Log/JackPotte&offset=20170819222910&limit=500&type=&user=JackPotte
[WngetQpAIC8AACoGlesAAACK] 2018-02-05 09:08:01: Неустранимое исключение типа «Wikimedia\Rdbms\DBQueryTimeoutError» on https://commons.wikimedia.org/w/index.php?title=Special:Log/JackPotte&offset=20170819201205&limit=500&type=&user=JackPotte
(yet another error on the same spot as above, forgot its ID)
[WnglJApAICoAAE-2J0cAAAAO] 2018-02-05 09:35:28: Неустранимое исключение типа «Wikimedia\Rdbms\DBQueryTimeoutError» on https://commons.wikimedia.org/w/index.php?title=Special:Log/JackPotte&offset=20170819195933&limit=320&type=&user=JackPotte
[WngowgpAIC0AAIqzxiAAAACT] 2018-02-05 09:50:55: Fatal exception of type "Wikimedia\Rdbms\DBQueryTimeoutError" on https://commons.wikimedia.org/w/index.php?title=Special:Log/JackPotte&offset=20170819195238&limit=320&type=&user=JackPotte&uselang=en
[WngulwpAAEAAAH@uMrUAAABR] 2018-02-05 10:15:47: Неустранимое исключение типа «Wikimedia\Rdbms\DBQueryTimeoutError» on https://commons.wikimedia.org/w/index.php?title=Special:Log/JackPotte&offset=20170819195238&limit=320&type=&user=JackPotte – exactly the same place as above (without uselang only), and after as much as 25 minutes

Krinkle renamed this task from user filtering of Special:Log is slow. Possibly using `times` index instead of `user_times` index to Unable to open Special:Log with user query (fatal timeout).Sep 17 2018, 5:58 PM
Krinkle changed Risk Rating from N/A to default.
Krinkle removed a project: Performance Issue.
Krinkle subscribed.

Still reproducible via the task description's URL.

mmodell changed the subtype of this task from "Task" to "Production Error".Aug 28 2019, 11:09 PM
Krinkle claimed this task.

Indeed. Even with larger limits such as limit=500 it still responds quickly (I checked that because there were a handful of newer events on this URL that may've hidden an issue with older data, which quering the older data might re-surface).

I suspect this may've been resolved by unifying the database indexes across the different db replicas, as well as newer hardware both for DBs and app servers.