Page MenuHomePhabricator

Run a Tool Labs query without Timing out
Closed, InvalidPublic

Description

I am trying to run this query but it is getting killed within 30 minutes. https://quarry.wmflabs.org/query/10293

I ran this query via Tool "Sonitool" on Tool Labs. Is there any way to ensure queries from this tool do not get timed out? A lot of queries I will be trying out will likely to be pretty extensive so optimising it further might not be feasible.

The query is for phabricator.wikimedia.org/T135644

Event Timeline

This could possibly be related to the fact that we only have two labsdb boxes right now, and both are fairly creaky and ailing and old...

Just in case it helps, here's the output from a sample run of the Query -

SELECT page_id, page_title, first_rev_id
    -> FROM (SELECT rev_page, MIN(rev_id) AS first_rev_id
    ->   FROM revision
    ->   ORDER BY RAND()
    ->   LIMIT 1000) as sample
    -> INNER JOIN page ON page_id = rev_page
    -> WHERE page_namespace in (0,118);

Empty set (37 min 27.68 sec)

jcrespo subscribed.

Do not ever do ORDER BY RAND() on revision - you are literally reading all revisions, ordering all of them and then discarding the millions that are not in the first 1000.

http://www.titov.net/2005/09/21/do-not-use-order-by-rand-or-how-to-get-random-rows-from-table/

Still getting an empty set. Not sure if the problem is with Query termination or the way the Query was framed :/

MariaDB [enwiki_p]> USE enwiki_p;
MariaDB [enwiki_p]> SELECT page_id, page_title, first_rev_id
    -> FROM (SELECT rev_page, MIN(rev_id) AS first_rev_id
    ->   FROM revision
    ->   LIMIT 1000) as sample
    -> INNER JOIN page ON page_id = rev_page
    -> WHERE page_namespace in (0,118);

Database changed
Empty set (16 min 47.81 sec)

Empty set (16 min 47.81 sec) on mysql command line client means that- no rows with the given filters.

Your query is strange. This is the query plan- you are executing 32 million times a 1000-row subquery. You should fix that:

+------+-------------+------------+--------+--------------------+---------+---------+-----------------+-----------+-------------+
| id   | select_type | table      | type   | possible_keys      | key     | key_len | ref             | rows      | Extra       |
+------+-------------+------------+--------+--------------------+---------+---------+-----------------+-----------+-------------+
|    1 | PRIMARY     | <derived2> | ALL    | NULL               | NULL    | NULL    | NULL            |      1000 | Using where |
|    1 | PRIMARY     | page       | eq_ref | PRIMARY,name_title | PRIMARY | 4       | sample.rev_page |         1 | Using where |
|    2 | DERIVED     | revision   | index  | NULL               | rev_id  | 4       | NULL            | 321749704 | Using index |
+------+-------------+------------+--------+--------------------+---------+---------+-----------------+-----------+-------------+
3 rows in set (0.02 sec)
chasemp triaged this task as Medium priority.
chasemp added a subscriber: yuvipanda.

I was wrong, you are executing 1000 times a 32-million row query (but the conclusions still stand).

scfc subscribed.

AFAIU T135644, the purpose of the queries have been achieved by some other means.