IndexPager::buildQueryInfo (contributions page unfiltered) query needs tuning
Open, Needs TriagePublic

Description

I think this is the top timing out query right now, on top of the usually bad combinations of recentchanges, at least for Wikidata- which is pathological because of the number of contributions a single page can have:

https://logstash.wikimedia.org/goto/ef85f8ef0882a29cf77489c530716f07

If this is due to a deployment, a bot changing query patterns or a data change, is yet to see. One example of the query that fails is:

SELECT  rev_id,rev_page,rev_text_id,rev_timestamp,rev_minor_edit,rev_deleted,rev_len,rev_parent_id,rev_sha1,COALESCE( comment_rev_comment.comment_text, rev_comment ) AS `rev_comment_text`,comment_rev_comment.comment_data AS `rev_comment_data`,comment_rev_comment.comment_id AS `rev_comment_cid`,rev_user,rev_user_text,NULL AS `rev_actor`,rev_content_format,rev_content_model,page_namespace,page_title,page_id,page_latest,page_is_redirect,page_len,user_name,page_is_new,(SELECT  GROUP_CONCAT(ct_tag SEPARATOR ',')  FROM `change_tag`    WHERE ct_rev_id=rev_id  ) AS `ts_tags`,ores_damaging_cls.oresc_probability AS `ores_damaging_score`,0.412 AS `ores_damaging_threshold`  FROM `revision` LEFT JOIN `revision_comment_temp` `temp_rev_comment` ON ((temp_rev_comment.revcomment_rev = rev_id)) LEFT JOIN `comment` `comment_rev_comment` ON ((comment_rev_comment.comment_id = temp_rev_comment.revcomment_comment_id)) INNER JOIN `page` ON ((page_id = rev_page)) LEFT JOIN `user` ON ((rev_user != 0) AND (user_id = rev_user)) LEFT JOIN `ores_classification` `ores_damaging_cls` ON (ores_damaging_cls.oresc_model = '9' AND (ores_damaging_cls.oresc_rev=rev_id) AND ores_damaging_cls.oresc_class = '1')   WHERE ((rev_user = '2752938')) AND (rev_parent_id = 0) AND (page_namespace = '0') AND ((rev_deleted & 4) = 0)  ORDER BY rev_timestamp DESC LIMIT 51

Running on the 'contributor' special partitioned replicas.

I don't think this is urgent because the number of errors is relatively low, although constant, but it is worth at least checking why it happens. I found T47619, probably not relevant because how old it is, except for the latest changes and decisions the query suffered.

jcrespo created this task.Mar 6 2018, 4:17 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMar 6 2018, 4:17 PM

Adding Anomie hoping he will not get mad at me 0:-) but he was involved with Springle last time.

jcrespo updated the task description. (Show Details)Mar 6 2018, 4:29 PM
Anomie added a comment.Mar 6 2018, 5:07 PM
wikiadmin@db1101(wikidatawiki)>explain SELECT  rev_id,rev_page,rev_text_id,rev_timestamp,rev_minor_edit,rev_deleted,rev_len,rev_parent_id,rev_sha1,COALESCE( comment_rev_comment.comment_text, rev_comment ) AS `rev_comment_text`,comment_rev_comment.comment_data AS `rev_comment_data`,comment_rev_comment.comment_id AS `rev_comment_cid`,rev_user,rev_user_text,NULL AS `rev_actor`,rev_content_format,rev_content_model,page_namespace,page_title,page_id,page_latest,page_is_redirect,page_len,user_name,page_is_new,(SELECT  GROUP_CONCAT(ct_tag SEPARATOR ',')  FROM `change_tag`    WHERE ct_rev_id=rev_id  ) AS `ts_tags`,ores_damaging_cls.oresc_probability AS `ores_damaging_score`,0.412 AS `ores_damaging_threshold`  FROM `revision` LEFT JOIN `revision_comment_temp` `temp_rev_comment` ON ((temp_rev_comment.revcomment_rev = rev_id)) LEFT JOIN `comment` `comment_rev_comment` ON ((comment_rev_comment.comment_id = temp_rev_comment.revcomment_comment_id)) INNER JOIN `page` ON ((page_id = rev_page)) LEFT JOIN `user` ON ((rev_user != 0) AND (user_id = rev_user)) LEFT JOIN `ores_classification` `ores_damaging_cls` ON (ores_damaging_cls.oresc_model = '9' AND (ores_damaging_cls.oresc_rev=rev_id) AND ores_damaging_cls.oresc_class = '1')   WHERE ((rev_user = 2752938)) AND (rev_parent_id = 0) AND (page_namespace = '0') AND ((rev_deleted & 4) = 0)  ORDER BY rev_timestamp DESC LIMIT 51;
+------+--------------------+---------------------+--------+---------------------------------------------------------------+-----------------------+---------+-----------------------------------------------------+----------+---------------------------------+
| id   | select_type        | table               | type   | possible_keys                                                 | key                   | key_len | ref                                                 | rows     | Extra                           |
+------+--------------------+---------------------+--------+---------------------------------------------------------------+-----------------------+---------+-----------------------------------------------------+----------+---------------------------------+
|    1 | PRIMARY            | page                | ref    | PRIMARY,name_title                                            | name_title            | 4       | const                                               | 23650821 | Using temporary; Using filesort |
|    1 | PRIMARY            | revision            | ref    | user_timestamp,rev_page_id,page_timestamp,page_user_timestamp | page_user_timestamp   | 8       | wikidatawiki.page.page_id,const                     |        1 | Using where                     |
|    1 | PRIMARY            | temp_rev_comment    | ref    | PRIMARY,revcomment_rev                                        | PRIMARY               | 4       | wikidatawiki.revision.rev_id                        |        1 | Using index                     |
|    1 | PRIMARY            | comment_rev_comment | eq_ref | PRIMARY                                                       | PRIMARY               | 8       | wikidatawiki.temp_rev_comment.revcomment_comment_id |        1 | Using where                     |
|    1 | PRIMARY            | user                | const  | PRIMARY                                                       | PRIMARY               | 4       | const                                               |        1 | Using where                     |
|    1 | PRIMARY            | ores_damaging_cls   | eq_ref | oresc_rev_model_class                                         | oresc_rev_model_class | 7       | wikidatawiki.revision.rev_id,const,const            |        1 |                                 |
|    2 | DEPENDENT SUBQUERY | change_tag          | ref    | change_tag_rev_tag                                            | change_tag_rev_tag    | 5       | wikidatawiki.revision.rev_id                        |        1 | Using where; Using index        |
+------+--------------------+---------------------+--------+---------------------------------------------------------------+-----------------------+---------+-----------------------------------------------------+----------+---------------------------------+
7 rows in set (0.00 sec)

It decides it wants to fetch all the pages in namespace 0, then use the page_user_timestamp to get all the revisions for those pages by the user in question. It's estimating 23 million rows it'll have to touch, out of 47 million total in namespace 0.

It could probably do better here, by using the user_timestamp index to fetch the revisions in order and filter them:

wikiadmin@db1101(wikidatawiki)>explain SELECT  rev_id,rev_page,rev_text_id,rev_timestamp,rev_minor_edit,rev_deleted,rev_len,rev_parent_id,rev_sha1,COALESCE( comment_rev_comment.comment_text, rev_comment ) AS `rev_comment_text`,comment_rev_comment.comment_data AS `rev_comment_data`,comment_rev_comment.comment_id AS `rev_comment_cid`,rev_user,rev_user_text,NULL AS `rev_actor`,rev_content_format,rev_content_model,page_namespace,page_title,page_id,page_latest,page_is_redirect,page_len,user_name,page_is_new,(SELECT  GROUP_CONCAT(ct_tag SEPARATOR ',')  FROM `change_tag`    WHERE ct_rev_id=rev_id  ) AS `ts_tags`,ores_damaging_cls.oresc_probability AS `ores_damaging_score`,0.412 AS `ores_damaging_threshold`  FROM `revision` USE INDEX (user_timestamp) LEFT JOIN `revision_comment_temp` `temp_rev_comment` ON ((temp_rev_comment.revcomment_rev = rev_id)) LEFT JOIN `comment` `comment_rev_comment` ON ((comment_rev_comment.comment_id = temp_rev_comment.revcomment_comment_id)) INNER JOIN `page` ON ((page_id = rev_page)) LEFT JOIN `user` ON ((rev_user != 0) AND (user_id = rev_user)) LEFT JOIN `ores_classification` `ores_damaging_cls` ON (ores_damaging_cls.oresc_model = '9' AND (ores_damaging_cls.oresc_rev=rev_id) AND ores_damaging_cls.oresc_class = '1')   WHERE ((rev_user = '2752938')) AND (rev_parent_id = 0) AND (page_namespace = '0') AND ((rev_deleted & 4) = 0)  ORDER BY rev_timestamp DESC LIMIT 51;
+------+--------------------+---------------------+--------+------------------------+-----------------------+---------+-----------------------------------------------------+----------+--------------------------+
| id   | select_type        | table               | type   | possible_keys          | key                   | key_len | ref                                                 | rows     | Extra                    |
+------+--------------------+---------------------+--------+------------------------+-----------------------+---------+-----------------------------------------------------+----------+--------------------------+
|    1 | PRIMARY            | revision            | ref    | user_timestamp         | user_timestamp        | 4       | const                                               | 32698864 | Using where              |
|    1 | PRIMARY            | user                | eq_ref | PRIMARY                | PRIMARY               | 4       | wikidatawiki.revision.rev_user                      |        1 | Using where              |
|    1 | PRIMARY            | temp_rev_comment    | ref    | PRIMARY,revcomment_rev | PRIMARY               | 4       | wikidatawiki.revision.rev_id                        |        1 | Using index              |
|    1 | PRIMARY            | comment_rev_comment | eq_ref | PRIMARY                | PRIMARY               | 8       | wikidatawiki.temp_rev_comment.revcomment_comment_id |        1 | Using where              |
|    1 | PRIMARY            | ores_damaging_cls   | eq_ref | oresc_rev_model_class  | oresc_rev_model_class | 7       | wikidatawiki.revision.rev_id,const,const            |        1 |                          |
|    1 | PRIMARY            | page                | eq_ref | PRIMARY,name_title     | PRIMARY               | 4       | wikidatawiki.revision.rev_page                      |        1 | Using where              |
|    2 | DEPENDENT SUBQUERY | change_tag          | ref    | change_tag_rev_tag     | change_tag_rev_tag    | 5       | wikidatawiki.revision.rev_id                        |        1 | Using where; Using index |
+------+--------------------+---------------------+--------+------------------------+-----------------------+---------+-----------------------------------------------------+----------+--------------------------+
7 rows in set (0.00 sec)

I suppose it's looking at the worst case: if nothing passes the filters then it thinks it'll have to go through 32 million rows. It turns out this bot actually has only 17 million revision total. On the other hand, it could go through many fewer rows since this query can stop early once the LIMIT is reached.

It turns out this only touches 360 rows for this case:

wikiadmin@db1101(wikidatawiki)>SELECT  rev_id,rev_page,rev_text_id,rev_timestamp,rev_minor_edit,rev_deleted,rev_len,rev_parent_id,rev_sha1,COALESCE( comment_rev_comment.comment_text, rev_comment ) AS `rev_comment_text`,comment_rev_comment.comment_data AS `rev_comment_data`,comment_rev_comment.comment_id AS `rev_comment_cid`,rev_user,rev_user_text,NULL AS `rev_actor`,rev_content_format,rev_content_model,page_namespace,page_title,page_id,page_latest,page_is_redirect,page_len,user_name,page_is_new,(SELECT  GROUP_CONCAT(ct_tag SEPARATOR ',')  FROM `change_tag`    WHERE ct_rev_id=rev_id  ) AS `ts_tags`,ores_damaging_cls.oresc_probability AS `ores_damaging_score`,0.412 AS `ores_damaging_threshold`  FROM `revision` USE INDEX (user_timestamp) LEFT JOIN `revision_comment_temp` `temp_rev_comment` ON ((temp_rev_comment.revcomment_rev = rev_id)) LEFT JOIN `comment` `comment_rev_comment` ON ((comment_rev_comment.comment_id = temp_rev_comment.revcomment_comment_id)) INNER JOIN `page` ON ((page_id = rev_page)) LEFT JOIN `user` ON ((rev_user != 0) AND (user_id = rev_user)) LEFT JOIN `ores_classification` `ores_damaging_cls` ON (ores_damaging_cls.oresc_model = '9' AND (ores_damaging_cls.oresc_rev=rev_id) AND ores_damaging_cls.oresc_class = '1')   WHERE ((rev_user = '2752938')) AND (rev_parent_id = 0) AND (page_namespace = '0') AND ((rev_deleted & 4) = 0)  ORDER BY rev_timestamp DESC LIMIT 51;

[... SNIP ...]

wikiadmin@db1101(wikidatawiki)>SHOW STATUS like 'Hand%';
+----------------------------+-------+
| Variable_name              | Value |
+----------------------------+-------+
| Handler_commit             | 3     |
| Handler_delete             | 0     |
| Handler_discover           | 0     |
| Handler_external_lock      | 0     |
| Handler_icp_attempts       | 0     |
| Handler_icp_match          | 0     |
| Handler_mrr_init           | 0     |
| Handler_mrr_key_refills    | 0     |
| Handler_mrr_rowid_refills  | 0     |
| Handler_prepare            | 0     |
| Handler_read_first         | 0     |
| Handler_read_key           | 207   |
| Handler_read_last          | 0     |
| Handler_read_next          | 101   |
| Handler_read_prev          | 52    |
| Handler_read_retry         | 0     |
| Handler_read_rnd           | 0     |
| Handler_read_rnd_deleted   | 0     |
| Handler_read_rnd_next      | 0     |
| Handler_rollback           | 0     |
| Handler_savepoint          | 0     |
| Handler_savepoint_rollback | 0     |
| Handler_tmp_update         | 0     |
| Handler_tmp_write          | 0     |
| Handler_update             | 0     |
| Handler_write              | 0     |
+----------------------------+-------+
26 rows in set (0.08 sec)

T47619 is not really related. The problematic query there had (rev_user >18370655) rather than an equality, which as noted would make the query there filesort when used with the user_timestamp index.

What probably made the difference is that rMW27c61fb1e94d: Add `actor` table and code to start using it recently removed the index forcing when there's only a single user. We could add that back in easily enough for now, although this may happen again during the actor table migration when the query has to check for either the actor ID or the user ID/name.

Change 416718 had a related patch set uploaded (by Anomie; owner: Anomie):
[mediawiki/core@master] Restore index forcing in ContribsPager

https://gerrit.wikimedia.org/r/416718

jcrespo added a comment.EditedMar 6 2018, 5:17 PM

I am all for deploying fast, seeing how queries react, reevaluate if needed, rather than being stationary and finding nitpicks: translation- implicit +1 to all your suggestions.

Change 416718 merged by jenkins-bot:
[mediawiki/core@master] Restore index forcing in ContribsPager

https://gerrit.wikimedia.org/r/416718