Page MenuHomePhabricator

Query plan changes on enwiki.revision queries with MCR change
Closed, ResolvedPublic

Description

Per T238966#6321601 we have now one host on enwiki serving production traffic that has MCR changes applied.
While we didn't see anything obvious when we altered s6, s2 and s5, I wanted to analyze slowly the changes before going for s4 (commonswiki).

On this host I have seen a query plan change on a host not running the MCR change (db1091) vs a host with the schema change applied (db1119)

MCR changes host

root@db1119.eqiad.wmnet[enwiki]> explain SELECT /* ApiQueryRevisions::run */ rev_id, rev_page, rev_timestamp, rev_minor_edit, rev_deleted, rev_len, rev_parent_id, rev_sha1, comment_rev_comment.comment_text AS `rev_comment_text`, comment_rev_comment.comment_data AS `rev_comment_data`, comment_rev_comment.comment_id AS `rev_comment_cid`, actor_rev_user.actor_user AS `rev_user`, actor_rev_user.actor_name AS `rev_user_text`, temp_rev_user.revactor_actor AS `rev_actor`, page_namespace, page_title, page_id, page_latest, page_is_redirect, page_len, user_name FROM `revision` IGNORE INDEX (rev_timestamp) JOIN `revision_comment_temp` `temp_rev_comment` ON ((temp_rev_comment.revcomment_rev = rev_id)) JOIN `comment` `comment_rev_comment` ON ((comment_rev_comment.comment_id = temp_rev_comment.revcomment_comment_id)) JOIN `revision_actor_temp` `temp_rev_user` ON ((temp_rev_user.revactor_rev = rev_id)) JOIN `actor` `actor_rev_user` ON ((actor_rev_user.actor_id = temp_rev_user.revactor_actor)) JOIN `page` ON ((page_id = rev_page)) LEFT JOIN `user` ON ((actor_rev_user.actor_user != 0) AND (user_id = actor_rev_user.actor_user)) WHERE rev_page = 534366 ORDER BY rev_timestamp DESC, rev_id DESC LIMIT 11 /* c541772d186a1d71a295aea404f6a2db db1119 enwiki 2s */;
+------+-------------+---------------------+--------+-------------------------------------------------------------+--------------------------+---------+-----------------------------------------------+-------+----------------+
| id   | select_type | table               | type   | possible_keys                                               | key                      | key_len | ref                                           | rows  | Extra          |
+------+-------------+---------------------+--------+-------------------------------------------------------------+--------------------------+---------+-----------------------------------------------+-------+----------------+
|    1 | SIMPLE      | page                | const  | PRIMARY                                                     | PRIMARY                  | 4       | const                                         |     1 | Using filesort |
|    1 | SIMPLE      | revision            | ref    | PRIMARY,rev_page_id,page_timestamp,rev_page_actor_timestamp | rev_page_actor_timestamp | 4       | const                                         | 51898 | Using where    |
|    1 | SIMPLE      | temp_rev_user       | ref    | PRIMARY,revactor_rev,actor_timestamp                        | PRIMARY                  | 4       | enwiki.revision.rev_id                        |     1 | Using index    |
|    1 | SIMPLE      | actor_rev_user      | eq_ref | PRIMARY                                                     | PRIMARY                  | 8       | enwiki.temp_rev_user.revactor_actor           |     1 |                |
|    1 | SIMPLE      | user                | eq_ref | PRIMARY                                                     | PRIMARY                  | 4       | enwiki.actor_rev_user.actor_user              |     1 | Using where    |
|    1 | SIMPLE      | temp_rev_comment    | ref    | PRIMARY,revcomment_rev                                      | PRIMARY                  | 4       | enwiki.revision.rev_id                        |     1 | Using index    |
|    1 | SIMPLE      | comment_rev_comment | eq_ref | PRIMARY                                                     | PRIMARY                  | 8       | enwiki.temp_rev_comment.revcomment_comment_id |     1 |                |
+------+-------------+---------------------+--------+-------------------------------------------------------------+--------------------------+---------+-----------------------------------------------+-------+----------------+
7 rows in set (0.001 sec)

Non MCR host:

root@db1091.eqiad.wmnet[enwiki]> explain SELECT /* ApiQueryRevisions::run */ rev_id, rev_page, rev_timestamp, rev_minor_edit, rev_deleted, rev_len, rev_parent_id, rev_sha1, comment_rev_comment.comment_text AS `rev_comment_text`, comment_rev_comment.comment_data AS `rev_comment_data`, comment_rev_comment.comment_id AS `rev_comment_cid`, actor_rev_user.actor_user AS `rev_user`, actor_rev_user.actor_name AS `rev_user_text`, temp_rev_user.revactor_actor AS `rev_actor`, page_namespace, page_title, page_id, page_latest, page_is_redirect, page_len, user_name FROM `revision` IGNORE INDEX (rev_timestamp) JOIN `revision_comment_temp` `temp_rev_comment` ON ((temp_rev_comment.revcomment_rev = rev_id)) JOIN `comment` `comment_rev_comment` ON ((comment_rev_comment.comment_id = temp_rev_comment.revcomment_comment_id)) JOIN `revision_actor_temp` `temp_rev_user` ON ((temp_rev_user.revactor_rev = rev_id)) JOIN `actor` `actor_rev_user` ON ((actor_rev_user.actor_id = temp_rev_user.revactor_actor)) JOIN `page` ON ((page_id = rev_page)) LEFT JOIN `user` ON ((actor_rev_user.actor_user != 0) AND (user_id = actor_rev_user.actor_user)) WHERE rev_page = 534366 ORDER BY rev_timestamp DESC, rev_id DESC LIMIT 11 /* c541772d186a1d71a295aea404f6a2db db1119 enwiki 2s */;
+------+-------------+---------------------+--------+--------------------------------------------------------+----------------+---------+-----------------------------------------------+-------+-------------+
| id   | select_type | table               | type   | possible_keys                                          | key            | key_len | ref                                           | rows  | Extra       |
+------+-------------+---------------------+--------+--------------------------------------------------------+----------------+---------+-----------------------------------------------+-------+-------------+
|    1 | SIMPLE      | page                | const  | PRIMARY                                                | PRIMARY        | 4       | const                                         | 1     |             |
|    1 | SIMPLE      | revision            | ref    | PRIMARY,page_timestamp,page_user_timestamp,rev_page_id | page_timestamp | 4       | const                                         | 50028 | Using where |
|    1 | SIMPLE      | temp_rev_user       | eq_ref | PRIMARY,revactor_rev,actor_timestamp                   | revactor_rev   | 4       | enwiki.revision.rev_id                        | 1     | Using index |
|    1 | SIMPLE      | actor_rev_user      | eq_ref | PRIMARY                                                | PRIMARY        | 8       | enwiki.temp_rev_user.revactor_actor           | 1     |             |
|    1 | SIMPLE      | user                | eq_ref | PRIMARY                                                | PRIMARY        | 4       | enwiki.actor_rev_user.actor_user              | 1     | Using where |
|    1 | SIMPLE      | temp_rev_comment    | ref    | PRIMARY,revcomment_rev                                 | PRIMARY        | 4       | enwiki.revision.rev_id                        | 1     | Using index |
|    1 | SIMPLE      | comment_rev_comment | eq_ref | PRIMARY                                                | PRIMARY        | 8       | enwiki.temp_rev_comment.revcomment_comment_id | 1     |             |
+------+-------------+---------------------+--------+--------------------------------------------------------+----------------+---------+-----------------------------------------------+-------+-------------+
7 rows in set (0.002 sec)

The query time on a non MCR host is around 0.002 (almost instant) and on the MCR host it is around 0.5 second, so that is some substantial increase for that given query.
If I force the query on the MCR host to use page_timestamp index like the non touched host does, the query goes back to original results (0.002), so almost instant. So using rev_page_actor_timestamp isn't ideal apparently.

Event Timeline

I can try to run an analyze table on revision to re-collect table stats and see what's the query plan after that.

Mentioned in SAL (#wikimedia-operations) [2020-07-21T10:01:59Z] <marostegui@cumin1001> dbctl commit (dc=all): 'Depool db1119 T258480', diff saved to https://phabricator.wikimedia.org/P11983 and previous config saved to /var/cache/conftool/dbconfig/20200721-100159-marostegui.json

Mentioned in SAL (#wikimedia-operations) [2020-07-21T10:02:22Z] <marostegui> Analyze revision table on db1119 T258480

Marostegui triaged this task as Medium priority.Jul 21 2020, 1:28 PM

The analyze table didn't change it (sort of expected), so while this is not super concerning, I would like Platform Engineering to take a look at this query and check if we can improve it (or maybe even hint the index).

daniel added a comment.EditedJul 21 2020, 7:26 PM

This sounds like it was caused by T253289: Remove USE INDEX usertext_timestamp and other references from code. Perhaps we shouldn't have removed the USE INDEX hints everywhere, just the outdated ones. Does that sound plausible to you @Marostegui ?

Not sure if that is related as the index that is not being chosen is page_timestamp which is still present on the table. And in fact, if forced to do so, the query goes back to the expected values.

@daniel how should we proceed here? I am "almost" finishing on s4 (commonswiki) and I will probably start with s8 (wikidatawiki) next week which is second biggest after enwiki and can potentially be affected by this small regression.

Proc added a subscriber: Proc.Aug 1 2020, 10:08 AM

@daniel, Can you take a look at this?

@daniel how should we proceed here? I am "almost" finishing on s4 (commonswiki) and I will probably start with s8 (wikidatawiki) next week which is second biggest after enwiki and can potentially be affected by this small regression.

@Marostegui Hey, I just got back from vacation.

I don't fully understand what's going on here and why, but here are a few observations that may be helpful for analysis:

  • The query in question selects the 11 newest revisions of a specific page, joining in additional information from other tables.
  • The correct index to use for the revision table is page_timestamp
  • Other requests to the same API module may filter by user. In that case, page_actor_timestamp from the revision_actor_temp table is the correct index to use, the revision table should use PRIMARY in that case.
  • The rev_page_actor_timestamp and rev_actor_timestamp indexes on the revision table must not be used at all until the rev_actor field has been populated (T215466).

So, it seems like we should do the following:

  • use IGNORE INDEX to suppress rev_page_actor_timestamp and rev_actor_timestamp, at least in ApiQueryRevisions, but ideally everywhere.
  • use FORCE INDEX page_timestamp in ApiQueryRevisions when not selecting by user.

Change 619780 had a related patch set uploaded (by Daniel Kinzler; owner: Daniel Kinzler):
[mediawiki/core@master] ApiQueryRevisions: force page_timestamp index.

https://gerrit.wikimedia.org/r/619780

@daniel how should we proceed here? I am "almost" finishing on s4 (commonswiki) and I will probably start with s8 (wikidatawiki) next week which is second biggest after enwiki and can potentially be affected by this small regression.

@Marostegui Hey, I just got back from vacation.

Welcome back!

  • The correct index to use for the revision table is page_timestamp

Yeah, that is exactly what I saw here, as reported earlier, it goes as fast as it should when using that index, which is what happens on those hosts that haven't been touched yet.

  • Other requests to the same API module may filter by user. In that case, page_actor_timestamp from the revision_actor_temp table is the correct index to use, the revision table should use PRIMARY in that case.
  • The rev_page_actor_timestamp and rev_actor_timestamp indexes on the revision table must not be used at all until the rev_actor field has been populated (T215466).

So, it seems like we should do the following:

  • use IGNORE INDEX to suppress rev_page_actor_timestamp and rev_actor_timestamp, at least in ApiQueryRevisions, but ideally everywhere.

Maybe it is better not to introduce that wide change for now but instead wait to see if the changes introduced make some other queries misbehave. So far this is the only query I have seen that is acting weirdly.
Let's not introduce more variables for now, as so far everything looks good on that enwiki altered host.

  • use FORCE INDEX page_timestamp in ApiQueryRevisions when not selecting by user.

+1 that would fix this issue.

daniel claimed this task.Aug 18 2020, 9:19 PM
daniel raised the priority of this task from Medium to High.Aug 18 2020, 9:34 PM

Bumping to high, since it blocks DBAs from continuing schema updates.

  • use FORCE INDEX page_timestamp in ApiQueryRevisions when not selecting by user.

+1 that would fix this issue.

Patch is up.

Thanks - I have commented on it with just a minor thing. Let's see if the other reviewers are ok and we can merge this week.

Change 619780 merged by jenkins-bot:
[mediawiki/core@master] ApiQueryRevisions: force page_timestamp index.

https://gerrit.wikimedia.org/r/619780

daniel closed this task as Resolved.Aug 24 2020, 9:12 PM