Page MenuHomePhabricator

SpecialRecentChanges::doMainQuery needs tuning
Open, MediumPublic

Description

The following query takes more than 2 minutes to run

SELECT /* SpecialRecentChanges::doMainQuery  */  /*! STRAIGHT_JOIN */ rc_id,rc_timestamp,rc_namespace,rc_title,rc_minor,rc_bot,rc_new,rc_cur_id,rc_this_oldid,rc_last_oldid,rc_type,rc_source,rc_patrolled,rc_ip,rc_old_len,rc_new_len,rc_deleted,rc_logid,rc_log_type,rc_log_action,rc_params,comment_rc_comment.comment_text AS `rc_comment_text`,comment_rc_comment.comment_data AS `rc_comment_data`,comment_rc_comment.comment_id AS `rc_comment_cid`,actor_rc_user.actor_user AS `rc_user`,actor_rc_user.actor_name AS `rc_user_text`,rc_actor,wl_user,wl_notificationtimestamp,page_latest,(SELECT  GROUP_CONCAT(ctd_name SEPARATOR ',')  FROM `change_tag` JOIN `change_tag_def` ON ((ct_tag_id=ctd_id))   WHERE ct_rc_id=rc_id  ) AS `ts_tags`,fp_stable,fp_pending_since,ores_damaging_cls.oresc_probability AS `ores_damaging_score`,ores_goodfaith_cls.oresc_probability AS `ores_goodfaith_score`  FROM `recentchanges` JOIN `comment` `comment_rc_comment` ON ((comment_rc_comment.comment_id = rc_comment_id)) JOIN `actor` `actor_rc_user` ON ((actor_rc_user.actor_id = rc_actor)) LEFT JOIN `watchlist` ON (wl_user = 36668941 AND (wl_title=rc_title) AND (wl_namespace=rc_namespace)) LEFT JOIN `page` ON ((rc_cur_id=page_id)) LEFT JOIN `flaggedpages` ON ((fp_page_id = rc_cur_id)) LEFT JOIN `ores_classification` `ores_damaging_cls` ON (ores_damaging_cls.oresc_model = 37 AND (ores_damaging_cls.oresc_rev=rc_this_oldid) AND ores_damaging_cls.oresc_class = 1) LEFT JOIN `ores_classification` `ores_goodfaith_cls` ON (ores_goodfaith_cls.oresc_model = 38 AND (ores_goodfaith_cls.oresc_rev=rc_this_oldid) AND ores_goodfaith_cls.oresc_class = 1)   WHERE ((rc_this_oldid = page_latest) OR rc_type = 3) AND (rc_type != 6) AND (rc_source != 'wb') AND ((ores_damaging_cls.oresc_probability BETWEEN 0.919 AND 1)) AND (rc_type NOT IN (3,5)) AND ((ores_goodfaith_cls.oresc_probability BETWEEN 0 AND 0.058)) AND (rc_type NOT IN (3,5)) AND (rc_timestamp >= '20200108105453') AND rc_new IN (0,1)   ORDER BY rc_timestamp DESC LIMIT 50;

16 rows in set (2 min 33.94 sec)
root@db1099.eqiad.wmnet[enwiki]> show explain for 1220761587;
+------+--------------------+--------------------+--------+--------------------------------------------------------+-----------------------+---------+-----------------------------------------------------------------------+---------+-----------------------------+
| id   | select_type        | table              | type   | possible_keys                                          | key                   | key_len | ref                                                                   | rows    | Extra                       |
+------+--------------------+--------------------+--------+--------------------------------------------------------+-----------------------+---------+-----------------------------------------------------------------------+---------+-----------------------------+
|    1 | PRIMARY            | recentchanges      | range  | rc_timestamp,new_name_timestamp,rc_actor,rc_this_oldid | rc_timestamp          | 16      | NULL                                                                  | 4299806 | Using where; Using filesort |
|    1 | PRIMARY            | comment_rc_comment | eq_ref | PRIMARY                                                | PRIMARY               | 8       | enwiki.recentchanges.rc_comment_id                                    |       1 |                             |
|    1 | PRIMARY            | actor_rc_user      | eq_ref | PRIMARY                                                | PRIMARY               | 8       | enwiki.recentchanges.rc_actor                                         |       1 |                             |
|    1 | PRIMARY            | watchlist          | eq_ref | wl_user,namespace_title,wl_user_notificationtimestamp  | wl_user               | 265     | const,enwiki.recentchanges.rc_namespace,enwiki.recentchanges.rc_title |       1 |                             |
|    1 | PRIMARY            | page               | eq_ref | PRIMARY                                                | PRIMARY               | 4       | enwiki.recentchanges.rc_cur_id                                        |       1 | Using where                 |
|    1 | PRIMARY            | flaggedpages       | eq_ref | PRIMARY                                                | PRIMARY               | 4       | enwiki.recentchanges.rc_cur_id                                        |       1 |                             |
|    1 | PRIMARY            | ores_damaging_cls  | eq_ref | oresc_rev_model_class                                  | oresc_rev_model_class | 7       | enwiki.recentchanges.rc_this_oldid,const,const                        |       1 | Using where                 |
|    1 | PRIMARY            | ores_goodfaith_cls | eq_ref | oresc_rev_model_class                                  | oresc_rev_model_class | 7       | enwiki.recentchanges.rc_this_oldid,const,const                        |       1 | Using where                 |
|    2 | DEPENDENT SUBQUERY | change_tag         | ref    | change_tag_rc_tag_id,change_tag_tag_id_id              | change_tag_rc_tag_id  | 5       | enwiki.recentchanges.rc_id                                            |       1 | Using index                 |
|    2 | DEPENDENT SUBQUERY | change_tag_def     | eq_ref | PRIMARY                                                | PRIMARY               | 4       | enwiki.change_tag.ct_tag_id                                           |       1 |                             |
+------+--------------------+--------------------+--------+--------------------------------------------------------+-----------------------+---------+-----------------------------------------------------------------------+---------+-----------------------------+

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Why is it deciding it needs to filesort? I see no reason it should need to with that plan, it should already be getting the rows in order.

Although I'm not sure that's really the problem. The main problem seems to be that the restrictive conditions ((ores_damaging_cls.oresc_probability BETWEEN 0.919 AND 1)) and ((ores_goodfaith_cls.oresc_probability BETWEEN 0 AND 0.058)) are making it have to scan a lot of rows to find the few that pass. If I relax those (e.g. 0.919→0.819 or 0.719 and 0.058→0.158 or 0.258) the query completes much faster, and SHOW EXPLAIN output doesn't seem to indicate the usual "fetch tons of rows, sort them, then throw most away" problem that usually makes filesorting bad.

I can't think of any way to speed this up without doing something like that, which would probably break what ORES is trying to do here.

Halfak subscribed.

Adding Growth-Team because this is related to their RecentChanges Filters.

tstarling renamed this task from SpecialRecentChanges::doMainQuery needs tunning to SpecialRecentChanges::doMainQuery needs tuning.Feb 18 2021, 9:36 PM