https://en.wiktionary.org/w/api.php?action=query&prop=transcludedin&titles=Module%3Alanguages%2Fdata2&tilimit=500 takes at least 30 seconds to complete, which is not acceptable. We need to investigate why is that.
Description
Details
Related Objects
Event Timeline
One debug run shows that most time is spent waiting on MySQL to actually return the content. The exact query executed was:
$ curl -v -H 'X-Wikimedia-Debug: backend=mw1099.eqiad.wmnet; profile' -H 'Accept: application/json' 'https://en.wiktionary.org/w/api.php?action=query&prop=transcludedin&titles=Module%3Alanguages%2Fdata2&tilimit=500&format=json&formatversion=2'
The query is simple enough:
SELECT tl_from,tl_namespace AS `bl_namespace`,tl_title AS `bl_title`,page_id,page_title,page_namespace,page_is_redirect FROM `templatelinks` FORCE INDEX (tl_namespace),`page` WHERE (tl_from = page_id) AND ((tl_namespace = '828' AND tl_title = 'languages/data2')) ORDER BY tl_from LIMIT 501;
The intention here is for it to pull the relevant entries in order from the templatelinks table, then join with page to get a few fields needed for the output.
Normally, this works fine:
mysql:wikiadmin@db1054 [enwiktionary]> explain SELECT tl_from,tl_namespace AS `bl_namespace`,tl_title AS `bl_title`,page_id,page_title,page_namespace,page_is_redirect FROM `templatelinks` FORCE INDEX (tl_namespace),`page` WHERE (tl_from = page_id) AND ((tl_namespace = '828' AND tl_title = 'languages')) ORDER BY tl_from LIMIT 501; +------+-------------+---------------+--------+---------------+--------------+---------+------------------------------------+---------+--------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+---------------+--------+---------------+--------------+---------+------------------------------------+---------+--------------------------+ | 1 | SIMPLE | templatelinks | ref | tl_namespace | tl_namespace | 261 | const,const | 8534620 | Using where; Using index | | 1 | SIMPLE | page | eq_ref | PRIMARY | PRIMARY | 4 | enwiktionary.templatelinks.tl_from | 1 | | +------+-------------+---------------+--------+---------------+--------------+---------+------------------------------------+---------+--------------------------+ 2 rows in set (0.00 sec) mysql:wikiadmin@db1054 [enwiktionary]> SELECT tl_from,tl_namespace AS `bl_namespace`,tl_title AS `bl_title`,page_id,page_title,page_namespace,page_is_redirect FROM `templatelinks` FORCE INDEX (tl_namespace),`page` WHERE (tl_from = page_id) AND ((tl_namespace = '828' AND tl_title = 'languages')) ORDER BY tl_from LIMIT 501; [...] 501 rows in set (0.00 sec) mysql:wikiadmin@db1054 [enwiktionary]> SHOW SESSION STATUS like 'Hand%'; +----------------------------+-------+ | Variable_name | Value | +----------------------------+-------+ | Handler_commit | 1 | | Handler_delete | 0 | | Handler_discover | 0 | | Handler_external_lock | 0 | | Handler_icp_attempts | 0 | | Handler_icp_match | 0 | | Handler_mrr_init | 0 | | Handler_mrr_key_refills | 0 | | Handler_mrr_rowid_refills | 0 | | Handler_prepare | 0 | | Handler_read_first | 0 | | Handler_read_key | 502 | | Handler_read_last | 0 | | Handler_read_next | 500 | | Handler_read_prev | 0 | | Handler_read_rnd | 0 | | Handler_read_rnd_deleted | 0 | | Handler_read_rnd_next | 130 | | Handler_rollback | 0 | | Handler_savepoint | 0 | | Handler_savepoint_rollback | 0 | | Handler_tmp_update | 0 | | Handler_tmp_write | 128 | | Handler_update | 0 | | Handler_write | 0 | +----------------------------+-------+ 25 rows in set (0.00 sec)
But for some reason for this title it's using a much worse plan:
mysql:wikiadmin@db1054 [enwiktionary]> explain SELECT tl_from,tl_namespace AS `bl_namespace`,tl_title AS `bl_title`,page_id,page_title,page_namespace,page_is_redirect FROM `templatelinks` FORCE INDEX (tl_namespace),`page` WHERE (tl_from = page_id) AND ((tl_namespace = '828' AND tl_title = 'languages/data2')) ORDER BY tl_from LIMIT 501; +------+-------------+---------------+------+---------------+--------------+---------+---------------------------------------+---------+---------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+---------------+------+---------------+--------------+---------+---------------------------------------+---------+---------------------------------+ | 1 | SIMPLE | page | ALL | PRIMARY | NULL | NULL | NULL | 5490747 | Using temporary; Using filesort | | 1 | SIMPLE | templatelinks | ref | tl_namespace | tl_namespace | 265 | const,const,enwiktionary.page.page_id | 1 | Using where; Using index | +------+-------------+---------------+------+---------------+--------------+---------+---------------------------------------+---------+---------------------------------+ 2 rows in set (0.00 sec) mysql:wikiadmin@db1054 [enwiktionary]> SELECT tl_from,tl_namespace AS `bl_namespace`,tl_title AS `bl_title`,page_id,page_title,page_namespace,page_is_redirect FROM `templatelinks` FORCE INDEX (tl_namespace),`page` WHERE (tl_from = page_id) AND ((tl_namespace = '828' AND tl_title = 'languages/data2')) ORDER BY tl_from LIMIT 501; [...] 501 rows in set (50.69 sec) mysql:wikiadmin@db1054 [enwiktionary]> SHOW SESSION STATUS like 'Hand%'; +----------------------------+----------+ | Variable_name | Value | +----------------------------+----------+ | Handler_commit | 1 | | Handler_delete | 0 | | Handler_discover | 0 | | Handler_external_lock | 0 | | Handler_icp_attempts | 0 | | Handler_icp_match | 0 | | Handler_mrr_init | 0 | | Handler_mrr_key_refills | 0 | | Handler_mrr_rowid_refills | 0 | | Handler_prepare | 0 | | Handler_read_first | 0 | | Handler_read_key | 5375735 | | Handler_read_last | 0 | | Handler_read_next | 4864453 | | Handler_read_prev | 0 | | Handler_read_rnd | 0 | | Handler_read_rnd_deleted | 0 | | Handler_read_rnd_next | 10365694 | | Handler_rollback | 0 | | Handler_savepoint | 0 | | Handler_savepoint_rollback | 0 | | Handler_tmp_update | 0 | | Handler_tmp_write | 4989955 | | Handler_update | 0 | | Handler_write | 0 | +----------------------------+----------+ 25 rows in set (0.00 sec)
Yes, for some reason it's scanning the whole page table, then using templatelinks as a filter, then filesorting so it can pull off the 501 needed rows.
We can force it to use the right plan with a STRAIGHT_JOIN, and that makes it return with the expected rapidity:
mysql:wikiadmin@db1054 [enwiktionary]> explain SELECT /*! STRAIGHT_JOIN */ tl_from,tl_namespace AS `bl_namespace`,tl_title AS `bl_title`,page_id,page_title,page_namespace,page_is_redirect FROM `templatelinks` FORCE INDEX (tl_namespace),`page` WHERE (tl_from = page_id) AND ((tl_namespace = '828' AND tl_title = 'languages/data2')) ORDER BY tl_from LIMIT 501; +------+-------------+---------------+--------+---------------+--------------+---------+------------------------------------+---------+--------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+---------------+--------+---------------+--------------+---------+------------------------------------+---------+--------------------------+ | 1 | SIMPLE | templatelinks | ref | tl_namespace | tl_namespace | 261 | const,const | 9677820 | Using where; Using index | | 1 | SIMPLE | page | eq_ref | PRIMARY | PRIMARY | 4 | enwiktionary.templatelinks.tl_from | 1 | | +------+-------------+---------------+--------+---------------+--------------+---------+------------------------------------+---------+--------------------------+ 2 rows in set (0.00 sec) mysql:wikiadmin@db1054 [enwiktionary]> SELECT /*! STRAIGHT_JOIN */ tl_from,tl_namespace AS `bl_namespace`,tl_title AS `bl_title`,page_id,page_title,page_namespace,page_is_redirect FROM `templatelinks` FORCE INDEX (tl_namespace),`page` WHERE (tl_from = page_id) AND ((tl_namespace = '828' AND tl_title = 'languages/data2')) ORDER BY tl_from LIMIT 501; [...] 501 rows in set (0.00 sec) mysql:wikiadmin@db1054 [enwiktionary]> SHOW SESSION STATUS like 'Hand%'; +----------------------------+-------+ | Variable_name | Value | +----------------------------+-------+ | Handler_commit | 1 | | Handler_delete | 0 | | Handler_discover | 0 | | Handler_external_lock | 0 | | Handler_icp_attempts | 0 | | Handler_icp_match | 0 | | Handler_mrr_init | 0 | | Handler_mrr_key_refills | 0 | | Handler_mrr_rowid_refills | 0 | | Handler_prepare | 0 | | Handler_read_first | 0 | | Handler_read_key | 502 | | Handler_read_last | 0 | | Handler_read_next | 500 | | Handler_read_prev | 0 | | Handler_read_rnd | 0 | | Handler_read_rnd_deleted | 0 | | Handler_read_rnd_next | 130 | | Handler_rollback | 0 | | Handler_savepoint | 0 | | Handler_savepoint_rollback | 0 | | Handler_tmp_update | 0 | | Handler_tmp_write | 128 | | Handler_update | 0 | | Handler_write | 0 | +----------------------------+-------+ 25 rows in set (0.00 sec)
My only guess as to why it's choosing such a bad plan is that it for some reason thinks the good plan will have to scan 9677820 rows when really it needs only 501 to fill the limit. I have no idea whether there's a better fix than STRAIGHT_JOIN to force it, that's a question for our DBAs.
Just to add some more info, in this specific templatelinks table there are 4.8M rows with those conditions:
MariaDB PRODUCTION s2 localhost enwiktionary > explain select count(*) from templatelinks where tl_namespace = 828 AND tl_title = 'languages/data2'; +------+-------------+---------------+------+---------------+--------------+---------+------------+---------+---------- | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+---------------+------+---------------+--------------+---------+-------------+---------+---- | 1 | SIMPLE | templatelinks | ref | tl_namespace | tl_namespace | 261 | const,const | 9585198 | Using where; Using index | +------+-------------+---------------+------+---------------+--------------+---------+-------------+---------+---- MariaDB PRODUCTION s2 localhost enwiktionary > select count(*) from templatelinks where tl_namespace = 828 AND tl_title = 'languages/data2'; +----------+ | count(*) | +----------+ | 4864454 | +----------+
And in the page table there are 5.3M rows. I guess this is the reason MySQL decides to ignore the index.
My 2 cents
The problem with the straight join (or hints in general) is that you fix things now and break it elsewhere on other wikis, or for another set of conditions, or with different row distribution in one year. Let's try different aproaches first, such as innodb statistics or MariaDB histograms for finer query plans.
I think this is a known bug, that Anomie and me have encountered in the past. Again, let me 1 day to figure out if there is an alternative that does not involve an index hint.
@jcrespo out of curiosity and learning, is this related to the fact that templatelinks is partitioned? Also if the partition key is fixed here being tl_namespace?
It does not appear to be related, since I see the same behavior on e.g. db1074 which has no partitioning.
I regenerated the table statistics without sucess, I will try to tune the query planner next forcing the usage of histograms.
MariaDB db1024.eqiad.wmnet enwiktionary > ANALYZE TABLE page; +-------------------+---------+----------+-----------------------------------------+ | Table | Op | Msg_type | Msg_text | +-------------------+---------+----------+-----------------------------------------+ | enwiktionary.page | analyze | status | Engine-independent statistics collected | | enwiktionary.page | analyze | status | OK | +-------------------+---------+----------+-----------------------------------------+ 2 rows in set (4 min 22.95 sec) MariaDB db1024.eqiad.wmnet enwiktionary > ANALYZE TABLE templatelinks; +----------------------------+---------+----------+-----------------------------------------+ | Table | Op | Msg_type | Msg_text | +----------------------------+---------+----------+-----------------------------------------+ | enwiktionary.templatelinks | analyze | status | Engine-independent statistics collected | | enwiktionary.templatelinks | analyze | status | OK | +----------------------------+---------+----------+-----------------------------------------+ 2 rows in set (42 min 3.26 sec)
I tried it, the only thing I took from here is that MariaDB -at least the version we use- is dumb, and I am 99% convinced this issue would be fixed on MySQL 5.7. Forcing or ignoring histograms, or changing optimizer_switch parameters did not work.
The reason being that if you run explain extended, filtered does not move away from 100%:
MariaDB db1024.eqiad.wmnet enwiktionary > EXPLAIN EXTENDED SELECT tl_from,tl_namespace AS `bl_namespace`,tl_title AS `bl_title`,page_id,page_title,page_namespace,page_is_redirect FROM `templatelinks` FORCE INDEX (tl_namespace),`page` WHERE (tl_from = page_id) AND ((tl_namespace = '828' AND tl_title = 'languages/data2')) ORDER BY tl_from LIMIT 501\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: page type: ALL possible_keys: PRIMARY key: NULL key_len: NULL ref: NULL rows: 5250137 filtered: 100.00 Extra: Using temporary; Using filesort *************************** 2. row *************************** id: 1 select_type: SIMPLE table: templatelinks type: ref possible_keys: tl_namespace key: tl_namespace key_len: 265 ref: const,const,enwiktionary.page.page_id rows: 1 filtered: 100.00 Extra: Using where; Using index 2 rows in set, 1 warning (0.02 sec)
Let's go with the straight_join solution, and I will setup a place to document this hack until we upgrade to a later version.
This is a one line patch, I can do it if you have faith in me; I personally don't. I would like to focus on the documentation part, though.
One more optimizer hint will not hurt:
Since ApiQueryBacklinks is using a straight join, it might be safe enough to do it for templatelinks as well?
Thanks for looking into alternative solutions, @jcrespo. I'll write the patch for straight_join.
I am sorry, Anomie, for this and all other issues. We are already looking at next versions of the server that will fix this and other issues (like the large IN values).
Change 310552 had a related patch set uploaded (by Anomie):
API: Force straight join for prop=linkshere|transcludedin|fileusage
Change 310552 merged by jenkins-bot:
API: Force straight join for prop=linkshere|transcludedin|fileusage
I don't have any problem with you SWATting it, although I note that at the moment wmf.19 has been rolled back due to T145819.
Change 311405 had a related patch set uploaded (by Mobrovac):
API: Force straight join for prop=linkshere|transcludedin|fileusage
Change 311406 had a related patch set uploaded (by Mobrovac):
API: Force straight join for prop=linkshere|transcludedin|fileusage
Change 311406 merged by jenkins-bot:
API: Force straight join for prop=linkshere|transcludedin|fileusage
Change 311405 merged by jenkins-bot:
API: Force straight join for prop=linkshere|transcludedin|fileusage
Mentioned in SAL (#wikimedia-operations) [2016-09-19T13:18:03Z] <hashar@tin> Synchronized php-1.28.0-wmf.18/includes/api/ApiQueryBacklinksprop.php: API: Force straight join for prop=linkshere|transcludedin|fileusage T145079 (duration: 00m 50s)
Mentioned in SAL (#wikimedia-operations) [2016-09-19T13:18:57Z] <hashar@tin> Synchronized php-1.28.0-wmf.19/includes/api/ApiQueryBacklinksprop.php: API: Force straight join for prop=linkshere|transcludedin|fileusage T145079 (duration: 00m 47s)