Page MenuHomePhabricator

ApiQueryExtLinksUsage::run needs tuning
Closed, ResolvedPublic

Description

I just found this query running for 1h on db1081 (commons API and main traffic eqiad host)

root@db1081.eqiad.wmnet[commonswiki]> explain SELECT /* ApiQueryExtLinksUsage::run */ el_index_60,el_id,page_id,page_namespace,page_title,el_to FROM `page`,`externallinks` WHERE (page_id=el_from) AND ((el_to NOT LIKE '//%' ESCAPE '`' ) OR (el_index_60 LIKE 'http://%' ESCAPE '`' )) ORDER BY el_index_60,el_id LIMIT 501;
+------+-------------+---------------+-------+--------------------------------------+------------+---------+--------------------------+----------+----------------------------------------------+
| id   | select_type | table         | type  | possible_keys                        | key        | key_len | ref                      | rows     | Extra                                        |
+------+-------------+---------------+-------+--------------------------------------+------------+---------+--------------------------+----------+----------------------------------------------+
|    1 | SIMPLE      | page          | index | PRIMARY                              | name_title | 261     | NULL                     | 81247627 | Using index; Using temporary; Using filesort |
|    1 | SIMPLE      | externallinks | ref   | el_from,el_index_60,el_from_index_60 | el_from    | 4       | commonswiki.page.page_id |        2 | Using where                                  |
+------+-------------+---------------+-------+--------------------------------------+------------+---------+--------------------------+----------+----------------------------------------------+
2 rows in set (0.01 sec)

It definitely needs some checking.

We also need to check why it's not been killed by the query killer:

| 2902860356 | wikiuser        | 10.64.16.197:51836 | commonswiki        | Query   |    6515 | Copying to tmp table             | SELECT /* ApiQueryExtLinksUsage::run  */  el_index_60,el_id,page_id,page_namespace,page_title,el_to  |    0.000 |
| 2902876401 | wikiuser        | 10.64.0.72:42644   | commonswiki        | Query   |    6505 | Copying to tmp table             | SELECT /* ApiQueryExtLinksUsage::run  */  el_index_60,el_id,page_id,page_namespace,page_title,el_to  |    0.000 |

So others have been killed though:

| 171966557 | 2020-02-04 15:53:33 | wmf_slave_wikiuser_slow (>60) | kill 2910296954; SELECT /* ApiQueryExtLinksUsage::run  */  el_index_60,el_id,page_id,page_namespace,page_title,el_to  |
| 171966557 | 2020-02-04 15:53:33 | wmf_slave_wikiuser_slow (>60) | kill 2910296954; SELECT /* ApiQueryExtLinksUsage::run  */  el_index_60,el_id,page_id,page_namespace,page_title,el_to  |
| 171966557 | 2020-02-04 15:56:03 | wmf_slave_wikiuser_slow (>60) | kill 2910452612; SELECT /* ApiQueryExtLinksUsage::run  */  el_index_60,el_id,page_id,page_namespace,page_title,el_to  |
| 171966557 | 2020-02-04 15:56:03 | wmf_slave_wikiuser_slow (>60) | kill 2910452612; SELECT /* ApiQueryExtLinksUsage::run  */  el_index_60,el_id,page_id,page_namespace,page_title,el_to  |
| 171966557 | 2020-02-04 15:58:03 | wmf_slave_wikiuser_slow (>60) | kill 2910615324; SELECT /* ApiQueryExtLinksUsage::run  */  el_index_60,el_id,page_id,page_namespace,page_title,el_to  |
| 171966557 | 2020-02-04 15:58:03 | wmf_slave_wikiuser_slow (>60) | kill 2910615324; SELECT /* ApiQueryExtLinksUsage::run  */  el_index_60,el_id,page_id,page_namespace,page_title,el_to  |
| 171966557 | 2020-02-04 15:58:33 | wmf_slave_wikiuser_slow (>60) | kill 2910642554; SELECT /* ApiQueryExtLinksUsage::run  */  el_index_60,el_id,page_id,page_namespace,page_title,el_to  |
| 171966557 | 2020-02-04 15:58:33 | wmf_slave_wikiuser_slow (>60) | kill 2910642554; SELECT /* ApiQueryExtLinksUsage::run  */  el_index_60,el_id,page_id,page_namespace,page_title,el_to  |
| 171966557 | 2020-02-04 16:01:35 | wmf_slave_wikiuser_sleep      | kill 2910864851

Event Timeline

Looks like yet another stupid plan when there's a much better one available:

wikiadmin@10.64.0.93(commonswiki)> explain SELECT /*!STRAIGHT_JOIN*/ /* ApiQueryExtLinksUsage::run */ el_index_60,el_id,page_id,page_namespace,page_title,el_to FROM `externallinks` JOIN `page` ON(page_id=el_from) WHERE ((el_to NOT LIKE '//%' ESCAPE '`' ) OR (el_index_60 LIKE 'http://%' ESCAPE '`' )) ORDER BY el_index_60,el_id LIMIT 501;
+------+-------------+---------------+--------+--------------------------------------+-------------+---------+-----------------------------------+------+-------------+
| id   | select_type | table         | type   | possible_keys                        | key         | key_len | ref                               | rows | Extra       |
+------+-------------+---------------+--------+--------------------------------------+-------------+---------+-----------------------------------+------+-------------+
|    1 | SIMPLE      | externallinks | index  | el_from,el_index_60,el_from_index_60 | el_index_60 | 66      | NULL                              |  501 | Using where |
|    1 | SIMPLE      | page          | eq_ref | PRIMARY                              | PRIMARY     | 4       | commonswiki.externallinks.el_from |    1 |             |
+------+-------------+---------------+--------+--------------------------------------+-------------+---------+-----------------------------------+------+-------------+

We can get it to choose that plan with a STRAIGHT_JOIN as shown, or we could force el_index_60. Either are extremely easy to do at the MW level, do you have a preference?

Thanks Brad - let's go for the STRAIGHT JOIN I would see.
I prefer to avoid forces or ignores in code if we can

Change 570141 had a related patch set uploaded (by Anomie; owner: Anomie):
[mediawiki/core@master] ApiQueryExtLinksUsage: Avoid bad query plan

https://gerrit.wikimedia.org/r/570141

The query killer seems to have been killing the query fine again during the night, so maybe it was a punctual issue.
I have also confirmed it gets killed by doing a manual run:

wikiuser@db1081.eqiad.wmnet[commonswiki]> SELECT /* ApiQueryExtLinksUsage::run */ el_index_60,el_id,page_id,page_namespace,page_title,el_to FROM `page`,`externallinks` WHERE (page_id=el_from) AND ((el_to NOT LIKE '//%' ESCAPE '`' ) OR (el_index_60 LIKE 'http://%' ESCAPE '`' )) ORDER BY el_index_60,el_id LIMIT 501;
ERROR 2013 (HY000): Lost connection to MySQL server during query

Change 570141 merged by jenkins-bot:
[mediawiki/core@master] ApiQueryExtLinksUsage: Avoid bad query plan

https://gerrit.wikimedia.org/r/570141

Patch is merged (should go out next week with wmf.19), and T244254#5851375 seems to indicate the query killer is working. So I'm going to close the task.

Thank you for fixing it so fast :)

@AMooney, I understand that this ticket is resolved. Should it be in the Done or Waiting for Deployment column instead of the Waiting for Review one?