Page MenuHomePhabricator

enwiki database on db1065 missing rev_timestamp index
Closed, DuplicatePublic

Description

I know this isn't the most beautiful query to start with, but it's only triggered by one-off maintenance script runs while reindexing content during downtime into elasticsearch. Because it's not particularly pretty we send it to the 'vslow' database, unfortunately due to a missing index this is not just a little slow, but a full table scan for each query.

Explain for query on db1089:

explain  SELECT   rev_timestamp,page_id,page_namespace,page_title,page_restrictions,page_is_redirect,page_is_new,page_random,page_touched,page_links_updated,page_latest,page_len,page_content_model  FROM `page`,`revision`    WHERE (rev_page = page_id) AND (rev_id = page_latest) AND (rev_timestamp >= '20170216110246') AND (rev_timestamp <= '20170216113454') AND (( rev_timestamp = '20170216110630' AND page_id > '2314424' ) OR ( rev_timestamp > '20170216110630' ))  ORDER BY rev_timestamp ASC,page_id ASC LIMIT 10;
+------+-------------+----------+--------+-----------------------------------------------------------------+---------------+---------+--------------------------+------+-----------------------------------------------------------+
| id   | select_type | table    | type   | possible_keys                                                   | key           | key_len | ref                      | rows | Extra                                                     |
+------+-------------+----------+--------+-----------------------------------------------------------------+---------------+---------+--------------------------+------+-----------------------------------------------------------+
|    1 | SIMPLE      | revision | range  | PRIMARY,rev_id,rev_timestamp,page_timestamp,page_user_timestamp | rev_timestamp | 20      | NULL                     | 2890 | Using where; Using index; Using temporary; Using filesort |
|    1 | SIMPLE      | page     | eq_ref | PRIMARY                                                         | PRIMARY       | 4       | enwiki.revision.rev_page |    1 | Using where                                               |
+------+-------------+----------+--------+-----------------------------------------------------------------+---------------+---------+--------------------------+------+-----------------------------------------------------------+

explain for same query running on db1065, the vslow database:

+------+-------------+----------+--------+--------------------------------------------------------+---------+---------+-------------------------+----------+---------------------------------+
| id   | select_type | table    | type   | possible_keys                                          | key     | key_len | ref                     | rows     | Extra                           |
+------+-------------+----------+--------+--------------------------------------------------------+---------+---------+-------------------------+----------+---------------------------------+
|    1 | SIMPLE      | page     | ALL    | PRIMARY                                                | NULL    | NULL    | NULL                    | 41833416 | Using temporary; Using filesort |
|    1 | SIMPLE      | revision | eq_ref | PRIMARY,rev_page_id,page_timestamp,page_user_timestamp | PRIMARY | 4       | enwiki.page.page_latest |        1 | Using where                     |
+------+-------------+----------+--------+--------------------------------------------------------+---------+---------+-------------------------+----------+---------------------------------+

Event Timeline

Hello,

We are aware of those differences, see: T132416.
Unfortunately, it is not an easy ALTER table, given the size of the DB it can take up to 4-5 days to get it altered. We have been doing some work to get it unified in some other important wikis, for example: T147305, T148967, T150644.
The idea with revision was to start altering codfw servers before the switchover, so we can get them done while they are standby hosts. Once we are on codfw, we can try to alter eqiad hosts and get them done (or at least as many as can while eqiad is on stand by): T132416#2915953

Not sure if you would like to leave this ticket open, we can mark it as duplicated of the parent one if you like as we are aware of this issue but it will take some time.