Page MenuHomePhabricator

SQL dewiki_p categorylinks cl_from index missing
Closed, ResolvedPublic

Description

sql dewiki_p
Please fix the cl_from index in table categorylinks. I have an increasing bot outage due to this issue.

Concerning the following mailing list traffic: https://lists.wikimedia.org/pipermail/mediawiki-api/2017-June/004014.html

This is the wrong mailing list, but it appears that there is something
wrong with the cl_from index on the dewiki.labsdb tool labs replicas.
For some reason this index is not being used.

However this issue is not present on the other replicas, even when
querying dewiki data.

For example, using enwiki.labsdb:

$ sql enwiki
MariaDB [enwiki_p]> select page_id, cl_to from dewiki_p.page left join
dewiki_p.categorylinks on cl_from = page_id where !page_namespace and
!page_is_redirect LIMIT 30;

Works fine.

So this is definitely a bug in tool labs. You should report it to the
cloud services team so that they can fix it.

--
Brian

On Wed, Jun 28, 2017 at 7:46 AM, Martin Domdey <animalia@gmx.net> wrote:
> Ah, before June 22nd the query has been done in less than 4 minutes [sic!].
> Now it lasts more than 1 hour!
> --
> Diese Nachricht wurde von meinem Android Mobiltelefon mit GMX Mail gesendet.
> Am 28.06.2017, 09:41, Martin Domdey <animalia@gmx.net> schrieb:
>>
>> Hi!
>>
>> MariaDB [dewiki_p]> select page_id, cl_to
>> from page left join categorylinks on cl_from = page_id
>> where !page_namespace and !page_is_redirect
>> order by page_title;
>>
>> Before the query has been done, I get a database close but I cannot filter
>> the query more.
>>
>> What can I do?
>>
>> Greetings,
>> Martin

Event Timeline

Bawolff subscribed.

For reference, the thread is https://lists.wikimedia.org/pipermail/mediawiki-api/2017-June/004014.html

tl;dr: When I do SELECT * from categorylinks where cl_from = 123456; this does a full table scan on dewiki.labsdb which is slow. on enwiki.labsdb, even when querying dewiki_p view, it uses cl_from index and is fast.

chasemp lowered the priority of this task from High to Medium.Jun 28 2017, 1:34 PM

I spoke with the DBA crew a bit today and this seems to be a real issue but not high priority at the moment

jcrespo subscribed.

Without looking too much in depth, there seems to be some unexpected query plan- there is not much we can do about it- except maybe run analyze on the table. My normal recommendation would be to transform that !namespace into several queries with the desired actual used namespaces. That may help the query do a better plan. I am open to suggestions on other ways- sadly, sanitization of private data methodology limits the way to fine-tune queries (index hints don't work :-/).

The issue happens because the query is so heavy that it gets killed- we cannot support hour-long queries in the current infrastructure.

Without looking too much in depth, there seems to be some unexpected query plan- there is not much we can do about it- except maybe run analyze on the table. My normal recommendation would be to transform that !namespace into several queries with the desired actual used namespaces. That may help the query do a better plan. I am open to suggestions on other ways- sadly, sanitization of private data methodology limits the way to fine-tune queries (index hints don't work :-/).

The issue happens because the query is so heavy that it gets killed- we cannot support hour-long queries in the current infrastructure.

While the original reporters query is non-ideal, that really doesn't seem to be the issue. The issue seems entirely to be around the join to categorylinks. Even a direct lookup of SELECT cl_to FROM categorylinks where cl_from = <some number>; seems broken, and you can't get a simpler query than that.

jcrespo raised the priority of this task from Medium to High.Jun 28 2017, 3:06 PM
jcrespo moved this task from Backlog (help welcome) to Pending comment on the DBA board.

Ok, sorry, I understand now (the initial email with the complex join confused me)- I can see now that the host is missing an indexes there- maybe a mistake on maintenance or something else:

labsdb1001:

CREATE TABLE `categorylinks` (
  `cl_from` int(8) unsigned NOT NULL DEFAULT '0',
  `cl_to` varbinary(255) NOT NULL DEFAULT '',
  `cl_sortkey` varbinary(230) NOT NULL DEFAULT '',
  `cl_timestamp` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  `cl_sortkey_prefix` varbinary(255) NOT NULL DEFAULT '',
  `cl_collation` varbinary(32) NOT NULL DEFAULT '',
  `cl_type` enum('page','subcat','file') NOT NULL DEFAULT 'page',
  PRIMARY KEY (`cl_from`,`cl_to`),
  KEY `cl_timestamp` (`cl_to`,`cl_timestamp`),
  KEY `cl_sortkey` (`cl_to`,`cl_type`,`cl_sortkey`,`cl_from`),
  KEY `cl_collation_ext` (`cl_collation`,`cl_to`,`cl_type`,`cl_from`)
) ENGINE=TokuDB DEFAULT CHARSET=binary `compression`='tokudb_zlib'

labsdb1003- default dewiki host:

CREATE TABLE `categorylinks` (
  `cl_from` int(8) unsigned NOT NULL DEFAULT '0',
  `cl_to` varbinary(255) NOT NULL DEFAULT '',
  `cl_sortkey` varbinary(230) NOT NULL DEFAULT '',
  `cl_timestamp` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  `cl_sortkey_prefix` varbinary(255) NOT NULL DEFAULT '',
  `cl_collation` varbinary(32) NOT NULL DEFAULT '',
  `cl_type` enum('page','subcat','file') NOT NULL DEFAULT 'page',
  KEY `cl_timestamp` (`cl_to`,`cl_timestamp`),
  KEY `cl_sortkey` (`cl_to`,`cl_type`,`cl_sortkey`,`cl_from`),
  KEY `cl_collation_ext` (`cl_collation`,`cl_to`,`cl_type`,`cl_from`)
) ENGINE=TokuDB DEFAULT CHARSET=binary `compression`='tokudb_zlib'

This is clearly a mistake. Please trasmit to the user that this will get fixed, but that for now (it may take some hours), he/she can use the s1 host or the new hosts in testing (labsdb-web.eqiad.wmnet).

Probably related to T166207 (alters sometimes timeout due to excesive load on labsdb* hosts).

jcrespo claimed this task.

This took less time than expected, but should now be fixed:

MariaDB [dewiki]> EXPLAIN SELECT * from categorylinks where cl_from = 123456;
+------+-------------+---------------+------+---------------+---------+---------+-------+------+-------+
| id   | select_type | table         | type | possible_keys | key     | key_len | ref   | rows | Extra |
+------+-------------+---------------+------+---------------+---------+---------+-------+------+-------+
|    1 | SIMPLE      | categorylinks | ref  | PRIMARY       | PRIMARY | 4       | const |    1 |       |
+------+-------------+---------------+------+---------------+---------+---------+-------+------+-------+
1 row in set (0.00 sec)

MariaDB [dewiki]> use dewiki_p
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
MariaDB [dewiki_p]> SELECT * from categorylinks where cl_from = 123456;
Empty set (0.00 sec)