Page MenuHomePhabricator

testwiki RecentChanges timeout
Closed, ResolvedPublicBUG REPORT

Description

Steps to replicate the issue:

What happens?

This search has timed out. You may wish to try different search parameters.

Manually select different time periods did make it work, but I shouldn't have to do so.

What should have happened instead?

Normal stream of changes.

Reproduced logged in and out, on both Chrome (latest) and Firefox (also latest).

Event Timeline

Urbanecm_WMF triaged this task as Unbreak Now! priority.Sep 6 2023, 7:19 PM
Urbanecm_WMF subscribed.

This seems like a significant problem, especially since this might happen at Wikipedias once the train gets there.

Excimer profile:

image.png (592×1 px, 110 KB)

Looks like a database query is getting over 30 seconds. Checking logstash shows the failing query:

SET STATEMENT max_statement_time=30 FOR SELECT  rc_id,rc_timestamp,rc_namespace,rc_title,rc_minor,rc_bot,rc_new,rc_cur_id,rc_this_oldid,rc_last_oldid,rc_type,rc_source,rc_patrolled,rc_ip,rc_old_len,rc_new_len,rc_deleted,rc_logid,rc_log_type,rc_log_action,rc_params,rc_actor,recentchanges_actor.actor_user AS `rc_user`,recentchanges_actor.actor_name AS `rc_user_text`,comment_rc_comment.comment_text AS `rc_comment_text`,comment_rc_comment.comment_data AS `rc_comment_data`,comment_rc_comment.comment_id AS `rc_comment_cid`,rc_title,rc_namespace,wl_user,wl_notificationtimestamp,we_expiry,page_latest,(SELECT  GROUP_CONCAT(ctd_name SEPARATOR ',')  FROM `change_tag` JOIN `change_tag_def` ON ((ct_tag_id=ctd_id))   WHERE ct_rc_id=rc_id  ) AS `ts_tags`,ores_damaging_cls.oresc_probability AS `ores_damaging_score`,ores_goodfaith_cls.oresc_probability AS `ores_goodfaith_score`  FROM `recentchanges` STRAIGHT_JOIN `actor` `recentchanges_actor` ON ((actor_id=rc_actor)) STRAIGHT_JOIN `comment` `comment_rc_comment` ON ((comment_rc_comment.comment_id = rc_comment_id)) LEFT JOIN `watchlist` ON (wl_user = 43912 AND (wl_title=rc_title) AND (wl_namespace=rc_namespace)) LEFT JOIN `watchlist_expiry` ON ((wl_id = we_item)) LEFT JOIN `page` ON ((rc_cur_id=page_id)) LEFT JOIN `ores_classification` `ores_damaging_cls` ON (ores_damaging_cls.oresc_model = 12 AND (ores_damaging_cls.oresc_rev=rc_this_oldid) AND ores_damaging_cls.oresc_class = 1) LEFT JOIN `ores_classification` `ores_goodfaith_cls` ON (ores_goodfaith_cls.oresc_model = 13 AND (ores_goodfaith_cls.oresc_rev=rc_this_oldid) AND ores_goodfaith_cls.oresc_class = 1)   WHERE rc_bot = 0 AND (rc_type != 6) AND (rc_source != 'wb') AND (rc_namespace NOT IN (1198,1199,866,867)) AND (rc_timestamp >= '20230807192534') AND rc_new IN (0,1)   ORDER BY rc_timestamp DESC LIMIT 250

This query... completes almost instantly on dbstore1007.eqiad.wmnet. In (some of) production, it takes an ethernity. Problematic DB server is at least db1189, db1166 and db1150, maybe others. On db1189, the query takes 2 min 3.790 sec, compared with milliseconds on dbstore1007. Tagging DBA.

EXPLAINs of the problematic query:

mysql:research@dbstore1007.eqiad.wmnet [testwiki]> explain SELECT  rc_id,rc_timestamp,rc_namespace,rc_title,rc_minor,rc_bot,rc_new,rc_cur_id,rc_this_oldid,rc_last_oldid,rc_type,rc_source,rc_patrolled,rc_ip,rc_old_len,rc_new_len,rc_deleted,rc_logid,rc_log_type,rc_log_action,rc_params,rc_actor,recentchanges_actor.actor_user AS `rc_user`,recentchanges_actor.actor_name AS `rc_user_text`,comment_rc_comment.comment_text AS `rc_comment_text`,comment_rc_comment.comment_data AS `rc_comment_data`,comment_rc_comment.comment_id AS `rc_comment_cid`,rc_title,rc_namespace,wl_user,wl_notificationtimestamp,we_expiry,page_latest,(SELECT  GROUP_CONCAT(ctd_name SEPARATOR ',')  FROM `change_tag` JOIN `change_tag_def` ON ((ct_tag_id=ctd_id))   WHERE ct_rc_id=rc_id  ) AS `ts_tags`,ores_damaging_cls.oresc_probability AS `ores_damaging_score`,ores_goodfaith_cls.oresc_probability AS `ores_goodfaith_score`  FROM `recentchanges` STRAIGHT_JOIN `actor` `recentchanges_actor` ON ((actor_id=rc_actor)) STRAIGHT_JOIN `comment` `comment_rc_comment` ON ((comment_rc_comment.comment_id = rc_comment_id)) LEFT JOIN `watchlist` ON (wl_user = 43912 AND (wl_title=rc_title) AND (wl_namespace=rc_namespace)) LEFT JOIN `watchlist_expiry` ON ((wl_id = we_item)) LEFT JOIN `page` ON ((rc_cur_id=page_id)) LEFT JOIN `ores_classification` `ores_damaging_cls` ON (ores_damaging_cls.oresc_model = 12 AND (ores_damaging_cls.oresc_rev=rc_this_oldid) AND ores_damaging_cls.oresc_class = 1) LEFT JOIN `ores_classification` `ores_goodfaith_cls` ON (ores_goodfaith_cls.oresc_model = 13 AND (ores_goodfaith_cls.oresc_rev=rc_this_oldid) AND ores_goodfaith_cls.oresc_class = 1)   WHERE rc_bot = 0 AND (rc_type != 6) AND (rc_source != 'wb') AND (rc_namespace NOT IN (1198,1199,866,867)) AND (rc_timestamp >= '20230807192534') AND rc_new IN (0,1)   ORDER BY rc_timestamp DESC LIMIT 250;
+------+--------------------+---------------------+--------+-----------------------------------------------------------------------------------------------------------------------+-----------------------+---------+---------------------------------------------------------------------------+------+-----------------------------+
| id   | select_type        | table               | type   | possible_keys                                                                                                         | key                   | key_len | ref                                                                       | rows | Extra                       |
+------+--------------------+---------------------+--------+-----------------------------------------------------------------------------------------------------------------------+-----------------------+---------+---------------------------------------------------------------------------+------+-----------------------------+
|    1 | PRIMARY            | recentchanges       | ALL    | rc_timestamp,rc_name_type_patrolled_timestamp,rc_ns_actor,rc_actor,rc_namespace_title_timestamp,rc_new_name_timestamp | NULL                  | NULL    | NULL                                                                      | 2116 | Using where; Using filesort |
|    1 | PRIMARY            | watchlist           | eq_ref | wl_user,wl_user_notificationtimestamp,wl_namespace_title                                                              | wl_user               | 265     | const,testwiki.recentchanges.rc_namespace,testwiki.recentchanges.rc_title | 1    |                             |
|    1 | PRIMARY            | watchlist_expiry    | eq_ref | PRIMARY                                                                                                               | PRIMARY               | 4       | testwiki.watchlist.wl_id                                                  | 1    | Using where                 |
|    1 | PRIMARY            | recentchanges_actor | eq_ref | PRIMARY                                                                                                               | PRIMARY               | 8       | testwiki.recentchanges.rc_actor                                           | 1    |                             |
|    1 | PRIMARY            | page                | eq_ref | PRIMARY                                                                                                               | PRIMARY               | 4       | testwiki.recentchanges.rc_cur_id                                          | 1    |                             |
|    1 | PRIMARY            | ores_damaging_cls   | eq_ref | oresc_rev_model_class,oresc_model_class_prob                                                                          | oresc_rev_model_class | 7       | testwiki.recentchanges.rc_this_oldid,const,const                          | 1    |                             |
|    1 | PRIMARY            | ores_goodfaith_cls  | eq_ref | oresc_rev_model_class,oresc_model_class_prob                                                                          | oresc_rev_model_class | 7       | testwiki.recentchanges.rc_this_oldid,const,const                          | 1    |                             |
|    1 | PRIMARY            | comment_rc_comment  | eq_ref | PRIMARY                                                                                                               | PRIMARY               | 8       | testwiki.recentchanges.rc_comment_id                                      | 1    |                             |
|    2 | DEPENDENT SUBQUERY | change_tag          | ref    | ct_rc_tag_id,ct_tag_id_id                                                                                             | ct_rc_tag_id          | 5       | testwiki.recentchanges.rc_id                                              | 1    | Using index                 |
|    2 | DEPENDENT SUBQUERY | change_tag_def      | eq_ref | PRIMARY                                                                                                               | PRIMARY               | 4       | testwiki.change_tag.ct_tag_id                                             | 1    |                             |
+------+--------------------+---------------------+--------+-----------------------------------------------------------------------------------------------------------------------+-----------------------+---------+---------------------------------------------------------------------------+------+-----------------------------+
10 rows in set (0.002 sec)

mysql:research@dbstore1007.eqiad.wmnet [testwiki]> 

wikiadmin2023@db1189(testwiki)> explain SELECT  rc_id,rc_timestamp,rc_namespace,rc_title,rc_minor,rc_bot,rc_new,rc_cur_id,rc_this_oldid,rc_last_oldid,rc_type,rc_source,rc_patrolled,rc_ip,rc_old_len,rc_new_len,rc_deleted,rc_logid,rc_log_type,rc_log_action,rc_params,rc_actor,recentchanges_actor.actor_user AS `rc_user`,recentchanges_actor.actor_name AS `rc_user_text`,comment_rc_comment.comment_text AS `rc_comment_text`,comment_rc_comment.comment_data AS `rc_comment_data`,comment_rc_comment.comment_id AS `rc_comment_cid`,rc_title,rc_namespace,wl_user,wl_notificationtimestamp,we_expiry,page_latest,(SELECT  GROUP_CONCAT(ctd_name SEPARATOR ',')  FROM `change_tag` JOIN `change_tag_def` ON ((ct_tag_id=ctd_id))   WHERE ct_rc_id=rc_id  ) AS `ts_tags`,ores_damaging_cls.oresc_probability AS `ores_damaging_score`,ores_goodfaith_cls.oresc_probability AS `ores_goodfaith_score`  FROM `recentchanges` STRAIGHT_JOIN `actor` `recentchanges_actor` ON ((actor_id=rc_actor)) STRAIGHT_JOIN `comment` `comment_rc_comment` ON ((comment_rc_comment.comment_id = rc_comment_id)) LEFT JOIN `watchlist` ON (wl_user = 43912 AND (wl_title=rc_title) AND (wl_namespace=rc_namespace)) LEFT JOIN `watchlist_expiry` ON ((wl_id = we_item)) LEFT JOIN `page` ON ((rc_cur_id=page_id)) LEFT JOIN `ores_classification` `ores_damaging_cls` ON (ores_damaging_cls.oresc_model = 12 AND (ores_damaging_cls.oresc_rev=rc_this_oldid) AND ores_damaging_cls.oresc_class = 1) LEFT JOIN `ores_classification` `ores_goodfaith_cls` ON (ores_goodfaith_cls.oresc_model = 13 AND (ores_goodfaith_cls.oresc_rev=rc_this_oldid) AND ores_goodfaith_cls.oresc_class = 1)   WHERE rc_bot = 0 AND (rc_type != 6) AND (rc_source != 'wb') AND (rc_namespace NOT IN (1198,1199,866,867)) AND (rc_timestamp >= '20230807192534') AND rc_new IN (0,1)   ORDER BY rc_timestamp DESC LIMIT 250;
+------+--------------------+---------------------+--------+-----------------------------------------------------------------------------------------------------------------------+-----------------------+---------+---------------------------------------------------------------------------+--------+-------------------------------------------------+
| id   | select_type        | table               | type   | possible_keys                                                                                                         | key                   | key_len | ref                                                                       | rows   | Extra                                           |
+------+--------------------+---------------------+--------+-----------------------------------------------------------------------------------------------------------------------+-----------------------+---------+---------------------------------------------------------------------------+--------+-------------------------------------------------+
|    1 | PRIMARY            | recentchanges       | ALL    | rc_timestamp,rc_name_type_patrolled_timestamp,rc_ns_actor,rc_actor,rc_namespace_title_timestamp,rc_new_name_timestamp | NULL                  | NULL    | NULL                                                                      | 2119   | Using where; Using temporary; Using filesort    |
|    1 | PRIMARY            | watchlist           | eq_ref | wl_user,wl_user_notificationtimestamp,wl_namespace_title                                                              | wl_user               | 265     | const,testwiki.recentchanges.rc_namespace,testwiki.recentchanges.rc_title | 1      |                                                 |
|    1 | PRIMARY            | watchlist_expiry    | eq_ref | PRIMARY                                                                                                               | PRIMARY               | 4       | testwiki.watchlist.wl_id                                                  | 1      | Using where                                     |
|    1 | PRIMARY            | recentchanges_actor | eq_ref | PRIMARY                                                                                                               | PRIMARY               | 8       | testwiki.recentchanges.rc_actor                                           | 1      |                                                 |
|    1 | PRIMARY            | page                | eq_ref | PRIMARY                                                                                                               | PRIMARY               | 4       | testwiki.recentchanges.rc_cur_id                                          | 1      |                                                 |
|    1 | PRIMARY            | ores_damaging_cls   | eq_ref | oresc_rev_model_class,oresc_model_class_prob                                                                          | oresc_rev_model_class | 7       | testwiki.recentchanges.rc_this_oldid,const,const                          | 1      |                                                 |
|    1 | PRIMARY            | ores_goodfaith_cls  | eq_ref | oresc_rev_model_class,oresc_model_class_prob                                                                          | oresc_rev_model_class | 7       | testwiki.recentchanges.rc_this_oldid,const,const                          | 1      |                                                 |
|    1 | PRIMARY            | comment_rc_comment  | ALL    | PRIMARY                                                                                                               | NULL                  | NULL    | NULL                                                                      | 232393 | Using where; Using join buffer (flat, BNL join) |
|    2 | DEPENDENT SUBQUERY | change_tag          | ref    | ct_rc_tag_id,ct_tag_id_id                                                                                             | ct_rc_tag_id          | 5       | testwiki.recentchanges.rc_id                                              | 1      | Using index                                     |
|    2 | DEPENDENT SUBQUERY | change_tag_def      | eq_ref | PRIMARY                                                                                                               | PRIMARY               | 4       | testwiki.change_tag.ct_tag_id                                             | 1      |                                                 |
+------+--------------------+---------------------+--------+-----------------------------------------------------------------------------------------------------------------------+-----------------------+---------+---------------------------------------------------------------------------+--------+-------------------------------------------------+
10 rows in set (0.002 sec)

wikiadmin2023@db1189(testwiki)>

Obvious difference in those explains is that on dbstore1007, comment_rc_comment uses the primary key, while db1189 apparently decides to ignore that index for some reason. Adding FORCE INDEX FOR JOIN (PRIMARY) to the comment_rc_comment JOIN appears to work:

wikiadmin2023@db1189(testwiki)> SELECT  rc_id,rc_timestamp,rc_namespace,rc_title,rc_minor,rc_bot,rc_new,rc_cur_id,rc_this_oldid,rc_last_oldid,rc_type,rc_source,rc_patrolled,rc_ip,rc_old_len,rc_new_len,rc_deleted,rc_logid,rc_log_type,rc_log_action,rc_params,rc_actor,recentchanges_actor.actor_user AS `rc_user`,recentchanges_actor.actor_name AS `rc_user_text`,comment_rc_comment.comment_text AS `rc_comment_text`,comment_rc_comment.comment_data AS `rc_comment_data`,comment_rc_comment.comment_id AS `rc_comment_cid`,rc_title,rc_namespace,wl_user,wl_notificationtimestamp,we_expiry,page_latest,(SELECT  GROUP_CONCAT(ctd_name SEPARATOR ',')  FROM `change_tag` JOIN `change_tag_def` ON ((ct_tag_id=ctd_id))   WHERE ct_rc_id=rc_id  ) AS `ts_tags`,ores_damaging_cls.oresc_probability AS `ores_damaging_score`,ores_goodfaith_cls.oresc_probability AS `ores_goodfaith_score`  FROM `recentchanges` STRAIGHT_JOIN `actor` `recentchanges_actor` ON ((actor_id=rc_actor)) STRAIGHT_JOIN `comment` `comment_rc_comment` FORCE INDEX FOR JOIN (PRIMARY) ON ((comment_rc_comment.comment_id = rc_comment_id)) LEFT JOIN `watchlist` ON (wl_user = 43912 AND (wl_title=rc_title) AND (wl_namespace=rc_namespace)) LEFT JOIN `watchlist_expiry` ON ((wl_id = we_item)) LEFT JOIN `page` ON ((rc_cur_id=page_id)) LEFT JOIN `ores_classification` `ores_damaging_cls` ON (ores_damaging_cls.oresc_model = 12 AND (ores_damaging_cls.oresc_rev=rc_this_oldid) AND ores_damaging_cls.oresc_class = 1) LEFT JOIN `ores_classification` `ores_goodfaith_cls` ON (ores_goodfaith_cls.oresc_model = 13 AND (ores_goodfaith_cls.oresc_rev=rc_this_oldid) AND ores_goodfaith_cls.oresc_class = 1)   WHERE rc_bot = 0 AND (rc_type != 6) AND (rc_source != 'wb') AND (rc_namespace NOT IN (1198,1199,866,867)) AND (rc_timestamp >= '20230807192534') AND rc_new IN (0,1)   ORDER BY rc_timestamp DESC LIMIT 250;

[...]

250 rows in set (0.020 sec)

wikiadmin2023@db1189(testwiki)>

Should we add the index hint into MediaWiki? Or is there an obvious reason why dbstore1007 and db1189 behave differently? Is this issue only present on testwiki, or does it affect other wikis? If it's testwiki-specific, why?

MariaDB query planner can pick wrong indexes or join orders for various reasons. Bug in the newer versions or old versions, not updated stats (we would need to run "optimize table comment" and for rc on these hosts and see if it makes a difference) and possibly more.

Another mess with RC is that its queries join many tables together (sometimes joins are useful and reduce the load, but if you join ten tables to each other, finding the right order becomes really hard for query planner). I have been advocating to at least do some of the joins in a separate query.

Adding index hint generally would be really complicated as RC queries take different forms and require different join orders or indexes depending on the user input.

I try to optimize comment and rc in db1189 and see what happens.

Doing optimize seems to have fixed it? Can you double check?

Doing optimize seems to have fixed it? Can you double check?

Thanks for doing that! It is much better now, but it still timeouts every now and then. Seems like there are other unoptimized hosts in s3: the query still runs for a long time on db1166, db2105 and db2109.

yeah, I do it on those now too, just double checking db1189. Let me do on the rest.

I ran it on the three other given replicas (db1166, db2105 and db2109). It should be fixed now but for later, we really need to rethink how RC works (see also T307328: Scalability issues of recentchanges table)

Urbanecm_WMF assigned this task to Ladsgroup.

I think so! Thanks.

Unfortunately this can and will happen anytime as the stats change. The only solution indeed is to tackle the root of the problem at T307328 as Amir pointed out.