Page MenuHomePhabricator

Some queries to new replica hosts are dramatically slower than labsdb; missing indexes?
Closed, ResolvedPublic

Description

We moved XTools to enwiki.analytics.db.svc.eqiad.wmflabs, and it worked fine for some tools like ArticleInfo, but for others such as the Edit Counter and Pages Created, it went crazy crazy slow compared to enwiki.labsdb (~178 secs vs. ~3 secs for User:L235). My guess is the indexing is not the same. Sorry I'm unable to debug right now... but here's the query for the Pages Created tool (User:L235):

(SELECT DISTINCT page_namespace AS namespace, 'rev' AS type, page_title AS page_title,
    page_len, page_is_redirect, rev_timestamp AS rev_timestamp,
    rev_user, rev_user_text AS username, rev_len, rev_id , pa_class, pa_importance, pa_page_revision
FROM `enwiki_p`.`page`
JOIN `enwiki_p`.`revision_userindex` ON page_id = rev_page
LEFT JOIN `enwiki_p`.`page_assessments` ON rev_page = pa_page_id
WHERE  rev_user = '11656865' AND rev_timestamp > 1  AND rev_parent_id = '0'  AND page_namespace = '0'   AND page_is_redirect = '0' 
GROUP BY rev_page
)

UNION

(SELECT a.ar_namespace AS namespace, 'arc' AS type, a.ar_title AS page_title,
    0 AS page_len, '0' AS page_is_redirect, MIN(a.ar_timestamp) AS rev_timestamp,
    a.ar_user AS rev_user, a.ar_user_text AS username, a.ar_len AS rev_len,
    a.ar_rev_id AS rev_id , NULL AS pa_class, NULL AS pa_page_id, NULL AS pa_page_revision
FROM `enwiki_p`.`archive_userindex` a
JOIN
(
    SELECT b.ar_namespace, b.ar_title
    FROM `enwiki_p`.`archive_userindex` AS b
    LEFT JOIN `enwiki_p`.`logging_userindex` ON log_namespace = b.ar_namespace AND log_title = b.ar_title
        AND log_user = b.ar_user AND (log_action = 'move' OR log_action = 'move_redir')
    WHERE  ar_user = '11656865' AND ar_timestamp > 1  AND b.ar_parent_id = '0'  AND ar_namespace = '0'  AND log_action IS NULL
) AS c ON c.ar_namespace= a.ar_namespace AND c.ar_title = a.ar_title
GROUP BY a.ar_namespace, a.ar_title
HAVING  rev_user = '11656865' 
)

I have moved production XTools back to enwiki.labsdb, so if you check that's why it isn't going slow right now.

Event Timeline

bd808 renamed this task from New replica hosts are dramatically slower than labsdb to Some queries to new replica hosts are dramatically slower than labsdb; missing indexes?.Sep 29 2017, 4:07 PM
bd808 edited projects, added Data-Services, DBA; removed cloud-services-team.
Marostegui triaged this task as Medium priority.Sep 29 2017, 4:37 PM
Marostegui moved this task from Backlog (help welcome) to In progress on the DBA board.
Marostegui subscribed.

So these are the two query plans:

mysql:root@localhost [enwiki]> select @@hostname;
+------------+
| @@hostname |
+------------+
| labsdb1001 |
+------------+
1 row in set (0.00 sec)

mysql:root@localhost [enwiki]> explain (SELECT DISTINCT page_namespace AS namespace, 'rev' AS type, page_title AS page_title,     page_len, page_is_redirect, rev_timestamp AS rev_timestamp,     rev_user, rev_user_text AS username, rev_len, rev_id , pa_class, pa_importance, pa_page_revision FROM `enwiki_p`.`page` JOIN `enwiki_p`.`revision_userindex` ON page_id = rev_page LEFT JOIN `enwiki_p`.`page_assessments` ON rev_page = pa_page_id WHERE  rev_user = '11656865' AND rev_timestamp > 1  AND rev_parent_id = '0'  AND page_namespace = '0'   AND page_is_redirect = '0'  GROUP BY rev_page )  UNION  (SELECT a.ar_namespace AS namespace, 'arc' AS type, a.ar_title AS page_title,     0 AS page_len, '0' AS page_is_redirect, MIN(a.ar_timestamp) AS rev_timestamp,     a.ar_user AS rev_user, a.ar_user_text AS username, a.ar_len AS rev_len,     a.ar_rev_id AS rev_id , NULL AS pa_class, NULL AS pa_page_id, NULL AS pa_page_revision FROM `enwiki_p`.`archive_userindex` a JOIN (     SELECT b.ar_namespace, b.ar_title     FROM `enwiki_p`.`archive_userindex` AS b     LEFT JOIN `enwiki_p`.`logging_userindex` ON log_namespace = b.ar_namespace AND log_title = b.ar_title         AND log_user = b.ar_user AND (log_action = 'move' OR log_action = 'move_redir')     WHERE  ar_user = '11656865' AND ar_timestamp > 1  AND b.ar_parent_id = '0'  AND ar_namespace = '0'  AND log_action IS NULL ) AS c ON c.ar_namespace= a.ar_namespace AND c.ar_title = a.ar_title GROUP BY a.ar_namespace, a.ar_title HAVING  rev_user = '11656865'  );
+------+--------------+------------------+--------+-----------------------------------------------------+----------------------+---------+-----------------------------------------------------+-------+----------------------------------------------+
| id   | select_type  | table            | type   | possible_keys                                       | key                  | key_len | ref                                                 | rows  | Extra                                        |
+------+--------------+------------------+--------+-----------------------------------------------------+----------------------+---------+-----------------------------------------------------+-------+----------------------------------------------+
|    1 | PRIMARY      | revision         | ref    | PRIMARY,rev_timestamp,page_timestamp,user_timestamp | user_timestamp       | 4       | const                                               | 17842 | Using where; Using temporary; Using filesort |
|    1 | PRIMARY      | page_assessments | ref    | PRIMARY                                             | PRIMARY              | 4       | enwiki.revision.rev_page                            |     1 |                                              |
|    1 | PRIMARY      | page             | eq_ref | PRIMARY,name_title,page_redirect_namespace_len      | PRIMARY              | 4       | enwiki.revision.rev_page                            |     1 | Using where                                  |
|    2 | UNION        | archive          | ref    | name_title_timestamp,user_timestamp                 | user_timestamp       | 4       | const                                               |  1915 | Using where; Using temporary; Using filesort |
|    2 | UNION        | logging          | ref    | type_time,user_time                                 | user_time            | 4       | enwiki.archive.ar_user                              |     1 | Using where                                  |
|    2 | UNION        | archive          | ref    | name_title_timestamp                                | name_title_timestamp | 261     | enwiki.archive.ar_namespace,enwiki.archive.ar_title |     2 | Using where                                  |
| NULL | UNION RESULT | <union1,2>       | ALL    | NULL                                                | NULL                 | NULL    | NULL                                                |  NULL |                                              |
+------+--------------+------------------+--------+-----------------------------------------------------+----------------------+---------+-----------------------------------------------------+-------+----------------------------------------------+
7 rows in set (0.01 sec)
mysql:root@localhost [enwiki]> select @@hostname;
+------------+
| @@hostname |
+------------+
| labsdb1011 |
+------------+
1 row in set (0.00 sec)

mysql:root@localhost [enwiki]> explain (SELECT DISTINCT page_namespace AS namespace, 'rev' AS type, page_title AS page_title,     page_len, page_is_redirect, rev_timestamp AS rev_timestamp,     rev_user, rev_user_text AS username, rev_len, rev_id , pa_class, pa_importance, pa_page_revision FROM `enwiki_p`.`page` JOIN `enwiki_p`.`revision_userindex` ON page_id = rev_page LEFT JOIN `enwiki_p`.`page_assessments` ON rev_page = pa_page_id WHERE  rev_user = '11656865' AND rev_timestamp > 1  AND rev_parent_id = '0'  AND page_namespace = '0'   AND page_is_redirect = '0'  GROUP BY rev_page )  UNION  (SELECT a.ar_namespace AS namespace, 'arc' AS type, a.ar_title AS page_title,     0 AS page_len, '0' AS page_is_redirect, MIN(a.ar_timestamp) AS rev_timestamp,     a.ar_user AS rev_user, a.ar_user_text AS username, a.ar_len AS rev_len,     a.ar_rev_id AS rev_id , NULL AS pa_class, NULL AS pa_page_id, NULL AS pa_page_revision FROM `enwiki_p`.`archive_userindex` a JOIN (     SELECT b.ar_namespace, b.ar_title     FROM `enwiki_p`.`archive_userindex` AS b     LEFT JOIN `enwiki_p`.`logging_userindex` ON log_namespace = b.ar_namespace AND log_title = b.ar_title         AND log_user = b.ar_user AND (log_action = 'move' OR log_action = 'move_redir')     WHERE  ar_user = '11656865' AND ar_timestamp > 1  AND b.ar_parent_id = '0'  AND ar_namespace = '0'  AND log_action IS NULL ) AS c ON c.ar_namespace= a.ar_namespace AND c.ar_title = a.ar_title GROUP BY a.ar_namespace, a.ar_title HAVING  rev_user = '11656865'  );
+------+--------------+------------------+--------+-----------------------------------------------------------------------------+----------------------+---------+-----------------------------------------------------+----------+---------------------------------------------------------------------+
| id   | select_type  | table            | type   | possible_keys                                                               | key                  | key_len | ref                                                 | rows     | Extra                                                               |
+------+--------------+------------------+--------+-----------------------------------------------------------------------------+----------------------+---------+-----------------------------------------------------+----------+---------------------------------------------------------------------+
|    1 | PRIMARY      | revision         | ref    | rev_timestamp,page_timestamp,user_timestamp,page_user_timestamp,rev_page_id | user_timestamp       | 4       | const                                               |    27144 | Using index condition; Using where; Using temporary; Using filesort |
|    1 | PRIMARY      | page_assessments | ref    | PRIMARY                                                                     | PRIMARY              | 4       | enwiki.revision.rev_page                            |        1 |                                                                     |
|    1 | PRIMARY      | page             | eq_ref | PRIMARY,name_title,page_redirect_namespace_len                              | PRIMARY              | 4       | enwiki.revision.rev_page                            |        1 | Using where                                                         |
|    2 | UNION        | archive          | ref    | name_title_timestamp                                                        | name_title_timestamp | 4       | const                                               | 25100957 | Using index condition; Using where; Using temporary; Using filesort |
|    2 | UNION        | logging          | ref    | type_time,user_time,log_user_type_time                                      | user_time            | 4       | enwiki.archive.ar_user                              |        1 | Using where                                                         |
|    2 | UNION        | archive          | ref    | name_title_timestamp                                                        | name_title_timestamp | 261     | enwiki.archive.ar_namespace,enwiki.archive.ar_title |        2 | Using where                                                         |
| NULL | UNION RESULT | <union1,2>       | ALL    | NULL                                                                        | NULL                 | NULL    | NULL                                                |     NULL |                                                                     |
+------+--------------+------------------+--------+-----------------------------------------------------------------------------+----------------------+---------+-----------------------------------------------------+----------+---------------------------------------------------------------------+
7 rows in set (0.00 sec)

There is a missing index on the archive table - these are the tables

mysql:root@localhost [enwiki]> select @@hostname; show create table archive\G
+------------+
| @@hostname |
+------------+
| labsdb1001 |
+------------+
1 row in set (0.00 sec)

*************************** 1. row ***************************
       Table: archive
Create Table: CREATE TABLE `archive` (
  `ar_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `ar_namespace` int(11) NOT NULL DEFAULT '0',
  `ar_title` varbinary(255) NOT NULL DEFAULT '',
  `ar_text` mediumblob NOT NULL,
  `ar_comment` tinyblob NOT NULL,
  `ar_user` int(5) unsigned NOT NULL DEFAULT '0',
  `ar_user_text` varbinary(255) NOT NULL DEFAULT '',
  `ar_timestamp` varbinary(14) NOT NULL DEFAULT '',
  `ar_minor_edit` tinyint(1) NOT NULL DEFAULT '0',
  `ar_flags` tinyblob NOT NULL,
  `ar_rev_id` int(8) unsigned DEFAULT NULL,
  `ar_text_id` int(8) unsigned DEFAULT NULL,
  `ar_deleted` tinyint(1) unsigned NOT NULL DEFAULT '0',
  `ar_len` int(8) unsigned DEFAULT NULL,
  `ar_page_id` int(10) unsigned DEFAULT NULL,
  `ar_parent_id` int(10) unsigned DEFAULT NULL,
  `ar_sha1` varbinary(32) NOT NULL DEFAULT '',
  `ar_content_model` varbinary(32) DEFAULT NULL,
  `ar_content_format` varbinary(64) DEFAULT NULL,
  PRIMARY KEY (`ar_id`),
  KEY `name_title_timestamp` (`ar_namespace`,`ar_title`,`ar_timestamp`),
  KEY `usertext_timestamp` (`ar_user_text`,`ar_timestamp`),
  KEY `ar_revid` (`ar_rev_id`),
  KEY `user_timestamp` (`ar_user`,`ar_timestamp`)
) ENGINE=InnoDB AUTO_INCREMENT=87026405 DEFAULT CHARSET=binary ROW_FORMAT=COMPRESSED `compression`='tokudb_zlib'
mysql:root@localhost [enwiki]> select @@hostname; show create table archive\G
+------------+
| @@hostname |
+------------+
| labsdb1011 |
+------------+
1 row in set (0.00 sec)

*************************** 1. row ***************************
       Table: archive
Create Table: CREATE TABLE `archive` (
  `ar_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `ar_namespace` int(11) NOT NULL DEFAULT '0',
  `ar_title` varbinary(255) NOT NULL DEFAULT '',
  `ar_text` mediumblob NOT NULL,
  `ar_comment` tinyblob NOT NULL,
  `ar_user` int(5) unsigned NOT NULL DEFAULT '0',
  `ar_user_text` varbinary(255) NOT NULL DEFAULT '',
  `ar_timestamp` varbinary(14) NOT NULL DEFAULT '',
  `ar_minor_edit` tinyint(1) NOT NULL DEFAULT '0',
  `ar_flags` tinyblob NOT NULL,
  `ar_rev_id` int(8) unsigned DEFAULT NULL,
  `ar_text_id` int(8) unsigned DEFAULT NULL,
  `ar_deleted` tinyint(1) unsigned NOT NULL DEFAULT '0',
  `ar_len` int(8) unsigned DEFAULT NULL,
  `ar_page_id` int(10) unsigned DEFAULT NULL,
  `ar_parent_id` int(10) unsigned DEFAULT NULL,
  `ar_sha1` varbinary(32) NOT NULL DEFAULT '',
  `ar_content_model` varbinary(32) DEFAULT NULL,
  `ar_content_format` varbinary(64) DEFAULT NULL,
  PRIMARY KEY (`ar_id`),
  KEY `name_title_timestamp` (`ar_namespace`,`ar_title`,`ar_timestamp`),
  KEY `usertext_timestamp` (`ar_user_text`,`ar_timestamp`),
  KEY `ar_revid` (`ar_rev_id`)
) ENGINE=InnoDB AUTO_INCREMENT=87026405 DEFAULT CHARSET=binary ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=8
1 row in set (0.00 sec)

The new labs replica doesn't have the user_timestamp index which is being used for this query.
I will add it probably on Monday, as I don't want to leave an ALTER table running there on a Friday evening :-)

I just realised that index is NOT on tables.sql, so it must have been added on the old labs server for some reason, so that is why it is not on the new labs server, so I am not sure this should be added there as these hosts are filtered copies of production and thus the schemas are the same.
So I will NOT add the index as that would mean we are drifting schemas from production

Explain via https://quarry.wmflabs.org/query/21882

idselect_typetabletypepossible_keyskeykey_lenrefrowsExtra
1PRIMARYrevisionrefrev_timestamp,page_timestamp,user_timestamp,page_user_timestamp,rev_page_iduser_timestamp4const26100Using index condition; Using where; Using temporary; Using filesort
1PRIMARYpage_assessmentsrefPRIMARYPRIMARY4enwiki.revision.rev_page1
1PRIMARYpageeq_refPRIMARY,name_title,page_redirect_namespace_lenPRIMARY4enwiki.revision.rev_page1Using where
2UNIONarchiverefname_title_timestampname_title_timestamp4const25098625Using index condition; Using where; Using temporary; Using filesort
2UNIONloggingreftype_time,user_time,log_user_type_timeuser_time4enwiki.archive.ar_user1Using where
2UNIONarchiverefname_title_timestampname_title_timestamp261enwiki.archive.ar_namespace,enwiki.archive.ar_title2Using where
UNION RESULT<union1,2>ALL

I just realised that index is NOT on tables.sql, so it must have been added on the old labs server for some reason, so that is why it is not on the new labs server, so I am not sure this should be added there as these hosts are filtered copies of production and thus the schemas are the same.
So I will NOT add the index as that would mean we are drifting schemas from production

If we dig through old bugs here in Phabricator I'm pretty sure we will find a large number of indices that have been added to the the Wiki Replica servers to improve performance of various analytics queries. I'm not sure that 1:1 parity with product indexes is a good thing to shoot for in this case. I think we should have all prod indices, but that it is also reasonable to have additional indices that make non-MediaWiki queries more performant.

I think we should have all prod indices, but that it is also reasonable to have additional indices that make non-MediaWiki queries more performant.

Yes, pwetty pwease :)

but that it is also reasonable to have additional indices that make non-MediaWiki queries more performant.

I don't disagree, we can have both extra indexes and tables but a process should be setup (puppet, etc.) to make those changes repeatable. Eg. T59617

but that it is also reasonable to have additional indices that make non-MediaWiki queries more performant.

I don't disagree, we can have both extra indexes and tables but a process should be setup (puppet, etc.) to make those changes repeatable. Eg. T59617

Hit the nail on the head!
If we don't have something in place, every time we clone or rebuild a labs host, we will hit the same problem we just did :-)

To be clear, I am not opposed to make the query fast per se :-)

So, thinking about it. Should we:

  1. Create the index to unblock @MusikAnimal
  2. Think a way of documenting, puppetizing or something to avoid this from hitting us again?

I'd be ok with stalling the index fix for a few days if we can get something properly designed and built relatively quickly. I would be glad to help work on scripting to automate repeatable schema changes along the lines of the https://gerrit.wikimedia.org/r/#/c/375349 script proposed for T59617 and other custom stuff that we already do if hands on keyboards is what is needed.

Maybe it's time to have a meeting or detailed discussion on one of these bugs about what the "right" solution looks like?

I'd be ok with stalling the index fix for a few days if we can get something properly designed and built relatively quickly. I would be glad to help work on scripting to automate repeatable schema changes along the lines of the https://gerrit.wikimedia.org/r/#/c/375349 script proposed for T59617 and other custom stuff that we already do if hands on keyboards is what is needed.

That's my concern, that I am not sure we (DBAs) have the bandwidth to be able to do that now, and I wouldn't want to block tools migrations to the new replicas.
If you guys can get something up quickly that'd be awesome though :-)

From my point of view, ideally we should have some sort of cronjob or similar that could compare the replicas schemas with tables.sql and generate a sql patch every day, or every week (doesn't really matter at this point).
So whenever we clone a new replica (if it is not from another replica, but from production) we could simple apply the patch that would contain the list of ALTER tables and voilá!

From my point of view, ideally we should have some sort of cronjob or similar that could compare the replicas schemas with tables.sql and generate a sql patch every day, or every week (doesn't really matter at this point).
So whenever we clone a new replica (if it is not from another replica, but from production) we could simple apply the patch that would contain the list of ALTER tables and voilá!

Setting up something to reverse engineer the schema changes seems possible. A tool like https://github.com/mmatuson/SchemaSync might not be too difficult to deploy somewhere (other than deb packaging horrors).

It also seems like it would be easy and straight forward to work in the other direction and have a script that manages a collection of ALTER patches that are purposefully added. The pattern of having an "updates" table where application of patch files is logged is pretty well established in the world of schema migrations. We could either do this per-wikidb or at a global level for the wiki replica server. In this model patching would be a process of adding a new migration patch and then running a tool that ensured that either a single wikidb or all of them are up to date with the collection of patches that are expected to be applied.

Change 382170 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mariadb: Add file to control additional wikireplicas-only indexes

https://gerrit.wikimedia.org/r/382170

Change 382170 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mariadb: Add file to control additional wikireplicas-only indexes

https://gerrit.wikimedia.org/r/382170

Nice start!

This should fix things:

root@labsdb1010[enwiki]> ALTER TABLE archive ADD KEY `user_timestamp` (`ar_user`,`ar_timestamp`);
Query OK, 0 rows affected, 4 warnings (6 min 45.34 sec)
Records: 0  Duplicates: 0  Warnings: 4

mysql> (SELECT ...
2 rows in set (0.61 sec)

I was going to propose to not use the index by rewriting the query, but actually the index, given the size of the archive table, could be even interesting for production, too; moreso due to the specifics of the view creation/filtering; so I think it is a legitimate request.

Change 382170 merged by Jcrespo:
[operations/puppet@production] mariadb: Add file to control additional wikireplicas-only indexes

https://gerrit.wikimedia.org/r/382170

jcrespo claimed this task.

@MusikAnimal Your query takes now 3 second cold, 0.13 seconds hot on the new servers, please migrate back to the new servers.

Lightning fast! :D Many thanks for the prompt assistance