Page MenuHomePhabricator

Investigate query planning in MariaDB 10
Closed, DeclinedPublic

Description

Followup from comments on Gerrit change 179146.

In MariaDB 5.5, the following query will filesort and be slow:

SELECT /*! STRAIGHT_JOIN */ page_namespace,page_title,page_id FROM `page`,`langlinks` WHERE page_namespace = '0' AND (page_id=ll_from) GROUP BY page_namespace,page_title,page_id ORDER BY page_title LIMIT 11;

While this query does not filesort, and is reasonably fast:

SELECT /*! STRAIGHT_JOIN */ page_namespace,page_title,page_id FROM `page`,`langlinks` WHERE page_namespace = '0' AND (page_id=ll_from) GROUP BY page_title,page_id ORDER BY page_title LIMIT 11;

However, in MariaDB 10 the second query also filesorts, which may kill performance for a lot of API queries. We should figure out if that's true, and what might need to be done to our queries to maintain performance.

Event Timeline

Anomie created this task.Dec 19 2014, 3:02 PM
Anomie raised the priority of this task from to Needs Triage.
Anomie updated the task description. (Show Details)
Anomie added a project: Wikimedia-Rdbms.
Anomie changed Security from none to None.
Anomie added subscribers: Anomie, Springle.

The idea is to select by, group by, and sort by the "name_title" index on the page table so no filesorting is necessary. Variations that may accomplish this include:

Remove the MySQL workaround for constant-in-where fields in ORDER BY:

SELECT /*! STRAIGHT_JOIN */ page_namespace,page_title,page_id FROM `page`,`langlinks` WHERE page_namespace = '0' AND (page_id=ll_from) GROUP BY page_namespace,page_title,page_id ORDER BY page_namespace,page_title LIMIT 11;
SELECT /*! STRAIGHT_JOIN */ page_namespace,page_title,page_id FROM `page`,`langlinks` WHERE page_namespace = '0' AND (page_id=ll_from) GROUP BY page_title,page_id ORDER BY page_namespace,page_title LIMIT 11;

Group by only the primary key, leaving out functionally-dependent fields.

SELECT /*! STRAIGHT_JOIN */ page_namespace,page_title,page_id FROM `page`,`langlinks` WHERE page_namespace = '0' AND (page_id=ll_from) GROUP BY page_id ORDER BY page_title LIMIT 11;
SELECT /*! STRAIGHT_JOIN */ page_namespace,page_title,page_id FROM `page`,`langlinks` WHERE page_namespace = '0' AND (page_id=ll_from) GROUP BY page_id ORDER BY page_namespace,page_title LIMIT 11;

Group by only the fields explicitly declared in the targeted index, not including the implicitly-appended primary key (which technically satisfies the "functionally dependent" requirement since it's a unique index, but do all relevant DBs realize that?):

SELECT /*! STRAIGHT_JOIN */ page_namespace,page_title,page_id FROM `page`,`langlinks` WHERE page_namespace = '0' AND (page_id=ll_from) GROUP BY page_title ORDER BY page_title LIMIT 11;
SELECT /*! STRAIGHT_JOIN */ page_namespace,page_title,page_id FROM `page`,`langlinks` WHERE page_namespace = '0' AND (page_id=ll_from) GROUP BY page_namespace,page_title ORDER BY page_namespace,page_title LIMIT 11;
Krinkle moved this task from Untriaged to Usage problem on the Wikimedia-Rdbms board.
Krinkle removed a project: Wikimedia-Rdbms.
Krinkle added a subscriber: Krinkle.

(not currently an issue with the RDBMS library or a schema schema. Mass-triaging, feel free to revert if i seems I got it wrong.)

Not sure what's the actionable here for the DBAs.
This looks like another case of the optimizer not doing what expected. I have tested the original two queries on the new 10.3 and they also filesort. (and considering this is a 5 years old ticket and nothing has changed from 5.5 to 10.3.... I guess we cannot have much hopes on MariaDB's optimizer doing the right thing).
We can always report it as a bug, but I don't think it will make much difference.
Thoughts?

WDoranWMF triaged this task as Low priority.Jul 2 2019, 7:15 PM
WDoranWMF edited projects, added Core Platform Team Legacy; removed Core Platform Team.
Anomie closed this task as Declined.Jul 8 2019, 2:51 PM

which may kill performance for a lot of API queries. We should figure out if that's true

I guess it wasn't true, since there were no bugs filed about it.

Restricted Application removed a subscriber: Liuxinyu970226. · View Herald TranscriptJul 8 2019, 2:51 PM
Anomie added a comment.Jul 9 2019, 6:40 PM

I found that the fifth possibility in T85000#936374 seems to do the right thing on 10.1.39.

wikiadmin@10.64.0.92(enwiki)> explain SELECT /*! STRAIGHT_JOIN */ page_namespace,page_title,page_id FROM `page`,`langlinks` WHERE page_namespace = '0' AND (page_id=ll_from) GROUP BY page_title ORDER BY page_title LIMIT 11;
+------+-------------+-----------+------+--------------------+------------+---------+---------------------+----------+--------------------------+
| id   | select_type | table     | type | possible_keys      | key        | key_len | ref                 | rows     | Extra                    |
+------+-------------+-----------+------+--------------------+------------+---------+---------------------+----------+--------------------------+
|    1 | SIMPLE      | page      | ref  | PRIMARY,name_title | name_title | 4       | const               | 23486175 | Using where; Using index |
|    1 | SIMPLE      | langlinks | ref  | PRIMARY            | PRIMARY    | 4       | enwiki.page.page_id |        3 | Using index              |
+------+-------------+-----------+------+--------------------+------------+---------+---------------------+----------+--------------------------+

So if we need to do something about the query at some point and 10.3 is the same as 10.1 here, I guess that's the thing to do.