Page MenuHomePhabricator

Possible full scan query ApiQueryUserContribs::execute for revision_actor_temp table on commonswiki
Closed, ResolvedPublic

Description

I just noticed some reports on Tendril for the following query on Commons:

SELECT /* ApiQueryUserContribs::execute */ rev_id, rev_page, rev_timestamp, rev_minor_edit, rev_deleted, rev_len, rev_parent_id, rev_sha1, comment_rev_comment.comment_text AS `rev_comment_text`, comment_rev_comment.comment_data AS `rev_comment_data`, comment_rev_comment.comment_id AS `rev_comment_cid`, actor_rev_user.actor_user AS `rev_user`, actor_rev_user.actor_name AS `rev_user_text`, temp_rev_user.revactor_actor AS `rev_actor`, page_namespace, page_title, page_id, page_latest, page_is_redirect, page_len FROM `revision` JOIN `revision_comment_temp` `temp_rev_comment` ON ((temp_rev_comment.revcomment_rev = rev_id)) JOIN `comment` `comment_rev_comment` ON ((comment_rev_comment.comment_id = temp_rev_comment.revcomment_comment_id)) JOIN `revision_actor_temp` `temp_rev_user` ON ((temp_rev_user.revactor_rev = rev_id)) JOIN `actor` `actor_rev_user` ON ((actor_rev_user.actor_id = temp_rev_user.revactor_actor)) JOIN `page` ON ((page_id = rev_page)) WHERE ((temp_rev_user.revactor_actor IN ('1224180', '14796') )) AND ((rev_deleted & 4) != 4) AND (revactor_timestamp<='20190422155900') AND (revactor_timestamp>='20190315160000') AND page_namespace = '6' AND (rev_parent_id = 0) ORDER BY rev_actor DESC, revactor_timestamp DESC, revactor_rev DESC LIMIT 501;

Which looks like it is doing a full scan:

root@db1081.eqiad.wmnet[(none)]> show explain for 1361484607;
+------+-------------+---------------------+--------+--------------------------------------------------------+-----------------+---------+----------------------------------------------------+-----------+--------------------------------------------------------------+
| id   | select_type | table               | type   | possible_keys                                          | key             | key_len | ref                                                | rows      | Extra                                                        |
+------+-------------+---------------------+--------+--------------------------------------------------------+-----------------+---------+----------------------------------------------------+-----------+--------------------------------------------------------------+
|    1 | SIMPLE      | actor_rev_user      | range  | PRIMARY                                                | PRIMARY         | 8       | NULL                                               |         2 | Using where; Using temporary; Using filesort                 |
|    1 | SIMPLE      | temp_rev_user       | index  | PRIMARY,revactor_rev,actor_timestamp                   | actor_timestamp | 22      | NULL                                               | 289808903 | Using where; Using index; Using join buffer (flat, BNL join) |
|    1 | SIMPLE      | temp_rev_comment    | eq_ref | PRIMARY,revcomment_rev                                 | revcomment_rev  | 4       | commonswiki.temp_rev_user.revactor_rev             |         1 | Using index                                                  |
|    1 | SIMPLE      | comment_rev_comment | eq_ref | PRIMARY                                                | PRIMARY         | 8       | commonswiki.temp_rev_comment.revcomment_comment_id |         1 |                                                              |
|    1 | SIMPLE      | revision            | eq_ref | PRIMARY,page_timestamp,rev_page_id,page_user_timestamp | PRIMARY         | 4       | commonswiki.temp_rev_user.revactor_rev             |         1 | Using where                                                  |
|    1 | SIMPLE      | page                | eq_ref | PRIMARY,name_title                                     | PRIMARY         | 4       | commonswiki.revision.rev_page                      |         1 | Using where                                                  |
+------+-------------+---------------------+--------+--------------------------------------------------------+-----------------+---------+----------------------------------------------------+-----------+--------------------------------------------------------------+
6 rows in set, 1 warning (0.00 sec)

Query time:

3 rows in set (1 min 28.46 sec)

Event Timeline

I have been talking to @Anomie in private and the conclusions is that this ticket is more related to T221458 T220999 than to T221380.
@Anomie has been taking a look and looks like it won't be as simple as adding an STRAIGHT JOIN, but we will need to kinda reorder the JOINs in some sort of way, similar to:

SELECT /* ApiQueryUserContribs::execute */ /*!STRAIGHT_JOIN*/ rev_id, rev_page, rev_timestamp, rev_minor_edit, rev_deleted, rev_len, rev_parent_id, rev_sha1, comment_rev_comment.comment_text AS `rev_comment_text`, comment_rev_comment.comment_data AS `rev_comment_data`, comment_rev_comment.comment_id AS `rev_comment_cid`, actor_rev_user.actor_user AS `rev_user`, actor_rev_user.actor_name AS `rev_user_text`, temp_rev_user.revactor_actor AS `rev_actor`, page_namespace, page_title, page_id, page_latest, page_is_redirect, page_len FROM `revision_actor_temp` `temp_rev_user` JOIN `revision` ON ((temp_rev_user.revactor_rev = rev_id)) JOIN `revision_comment_temp` `temp_rev_comment` ON ((temp_rev_comment.revcomment_rev = rev_id)) JOIN `comment` `comment_rev_comment` ON ((comment_rev_comment.comment_id = temp_rev_comment.revcomment_comment_id)) JOIN `actor` `actor_rev_user` ON ((actor_rev_user.actor_id = temp_rev_user.revactor_actor)) JOIN `page` ON ((page_id = rev_page)) WHERE ((temp_rev_user.revactor_actor IN ('1224180', '14796') )) AND ((rev_deleted & 4) != 4) AND (revactor_timestamp<='20190422155900') AND (revactor_timestamp>='20190315160000') AND page_namespace = '6' AND (rev_parent_id = 0) ORDER BY rev_actor DESC, revactor_timestamp DESC, revactor_rev DESC LIMIT 501;

Which gives a good query plan and the query runs instantly:

3 rows in set (0.00 sec)
+------+-------------+---------------------+--------+--------------------------------------------------------+-----------------+---------+----------------------------------------------------+------+--------------------------+
| id   | select_type | table               | type   | possible_keys                                          | key             | key_len | ref                                                | rows | Extra                    |
+------+-------------+---------------------+--------+--------------------------------------------------------+-----------------+---------+----------------------------------------------------+------+--------------------------+
|    1 | SIMPLE      | temp_rev_user       | range  | PRIMARY,revactor_rev,actor_timestamp                   | actor_timestamp | 22      | NULL                                               |   10 | Using where; Using index |
|    1 | SIMPLE      | revision            | eq_ref | PRIMARY,page_timestamp,rev_page_id,page_user_timestamp | PRIMARY         | 4       | commonswiki.temp_rev_user.revactor_rev             |    1 | Using where              |
|    1 | SIMPLE      | temp_rev_comment    | ref    | PRIMARY,revcomment_rev                                 | PRIMARY         | 4       | commonswiki.temp_rev_user.revactor_rev             |    1 | Using index              |
|    1 | SIMPLE      | comment_rev_comment | eq_ref | PRIMARY                                                | PRIMARY         | 8       | commonswiki.temp_rev_comment.revcomment_comment_id |    1 |                          |
|    1 | SIMPLE      | actor_rev_user      | eq_ref | PRIMARY                                                | PRIMARY         | 8       | commonswiki.temp_rev_user.revactor_actor           |    1 |                          |
|    1 | SIMPLE      | page                | eq_ref | PRIMARY,name_title                                     | PRIMARY         | 4       | commonswiki.revision.rev_page                      |    1 | Using where              |
+------+-------------+---------------------+--------+--------------------------------------------------------+-----------------+---------+----------------------------------------------------+------+--------------------------+
6 rows in set (0.00 sec)

Change 505853 had a related patch set uploaded (by Anomie; owner: Anomie):
[mediawiki/core@master] API: Add STRAIGHT_JOIN to ApiQueryUserContribs to avoid planner oddness

https://gerrit.wikimedia.org/r/505853

Do we have some process for sending these kinds of optimizer bugs upstream? We are using an old version of MariaDB so I imagine opening bugs is not useful, but then large-scale MariaDB users can't really be expected to repro production issues on master, so there is probably some meachnism for collecting feedback on the optimizer...

Unfortunately optimizer bugs are not easy to reproduce (it highly depends - I believe - on the size of the table). We have tried to report optimizer bugs before and we've got not much help on that from upstream. Someone could argue that the optimizer is doing what it is supposed to do (even if it doesn't make sense from our point of view given the distribution of our data).

My own experience reporting bugs to MariaDB isn't great, they are not very responsive or tend to be quite slow, even for well defined and easy to reproduce bugs
ie: https://jira.mariadb.org/browse/MDEV-11351?filter=-2
https://jira.mariadb.org/browse/MDEV-14732 (this one affects us too - https://jira.mariadb.org/browse/MDEV-16466)

Or even they close bugs even if you report that it is not yet fixed: https://jira.mariadb.org/browse/MDEV-11106?filter=-2

We are using the latest version of the 10.1 series which is still maintained and updated (https://mariadb.com/kb/en/library/changes-improvements-in-mariadb-101/), so we are up to date on that field. We have seen the optimizer behaving differently from one version to another and getting our own bugs fixed between versions, hopefully this will be one of these soon!

EvanProdromou subscribed.

I think we're done here and just waiting for the train, so I'm moving it to our "done with CPT" column.

Change 505853 merged by jenkins-bot:
[mediawiki/core@master] API: Add STRAIGHT_JOIN to ApiQueryUserContribs to avoid planner oddness

https://gerrit.wikimedia.org/r/505853

EvanProdromou claimed this task.

I think this has gone out, so I'm calling it resolved.