Page MenuHomePhabricator

Geodata running long running queries on Commons
Closed, ResolvedPublicPRODUCTION ERROR

Description

It seems to be https://tools.wmflabs.org/locator-tool/ but that is not to blame. A single application should not overload the databases- last time that happened, on wikidata, it brought down all wikis.

{

"_index": "logstash-2018.06.06",
"_type": "mediawiki",
"_id": "AWPUJyIoVAdyMbafYwpw",
"_version": 1,
"_score": null,
"_source": {
  "server": "commons.wikimedia.org",
  "db_server": "10.64.0.93",
  "wiki": "commonswiki",
  "channel": "DBQuery",
  "type": "mediawiki",
  "error": "Read timeout is reached (10.64.0.93)",
  "http_method": "POST",
  "@version": 1,
  "host": "mw1224",
  "shard": "s4",
  "sql1line": "SELECT  gt_id,gt_page_id,gt_lat,gt_lon,gt_primary,gt_globe,gt_type,gt_name  FROM `geo_tags`    WHERE gt_page_id IN ('47481531','47481533','47481535','47481536','47481539','47481541','47481549','47481552','47481553','47481557','47481559','47481560','47481563','47395115','47395117','47395120','47395122','47395123','47395124','47395125','47395127','47395133','47395134','47395137','47395138','47395139','47395140','47395141','47395142','47395145','47395146','47395147','47395149','47395154','47395156','47395157','47395158','47395160','47395162','47395163','47481566','47481570','47481572','47481575','47481577','47395116','47481578','47481581','47481583','47481586')  AND (gt_page_id > 47481563 OR ( gt_page_id = 47481563 AND gt_id >= 185145944 ))  ORDER BY gt_page_id,gt_id LIMIT 11  ",
  "fname": "GeoData\\ApiQueryCoordinates::execute",
  "errno": 2062,
  "unique_id": "WxeXGApAMDsAAJcQ6fwAAACK",
  "method": "Wikimedia\\Rdbms\\Database::makeQueryException",
  "level": "ERROR",
  "ip": "10.64.0.104",
  "mwversion": "1.32.0-wmf.6",
  "message": "GeoData\\ApiQueryCoordinates::execute\t10.64.0.93\t2062\tRead timeout is reached (10.64.0.93)\tSELECT  gt_id,gt_page_id,gt_lat,gt_lon,gt_primary,gt_globe,gt_type,gt_name  FROM `geo_tags`    WHERE gt_page_id IN ('47481531','47481533','47481535','47481536','47481539','47481541','47481549','47481552','47481553','47481557','47481559','47481560','47481563','47395115','47395117','47395120','47395122','47395123','47395124','47395125','47395127','47395133','47395134','47395137','47395138','47395139','47395140','47395141','47395142','47395145','47395146','47395147','47395149','47395154','47395156','47395157','47395158','47395160','47395162','47395163','47481566','47481570','47481572','47481575','47481577','47395116','47481578','47481581','47481583','47481586')  AND (gt_page_id > 47481563 OR ( gt_page_id = 47481563 AND gt_id >= 185145944 ))  ORDER BY gt_page_id,gt_id LIMIT 11  ",
  "normalized_message": "{fname}\t{db_server}\t{errno}\t{error}\t{sql1line}",
  "url": "/w/api.php?action=query&format=json&origin=*",
  "tags": [
    "syslog",
    "es",
    "es"
  ],
  "reqId": "WxeXGApAMDsAAJcQ6fwAAACK",
  "referrer": "https://tools.wmflabs.org/locator-tool/",
  "@timestamp": "2018-06-06T08:12:04.000Z",
  "db_name": "commonswiki",
  "db_user": "wikiuser"
},
"fields": {
  "@timestamp": [
    1528272724000
  ]
},
"sort": [
  1528272724000
]

}

Event Timeline

jcrespo triaged this task as High priority.Jun 6 2018, 8:26 AM

High because it threatens the stability of the site.

Would it be sufficient to only allow iteration via titles or iteration, but not both, or is there some other particular change you are looking to see here?

It looks like even without the oddly combined filtering, the by page_id condition is still pretty bad:

wikiadmin@db1084(commonswiki)>explain SELECT  gt_id,gt_page_id,gt_lat,gt_lon,gt_primary,gt_globe,gt_type,gt_name  FROM `geo_tags`    WHERE gt_page_id IN ('47481531','47481533','47481535','47481536','47481539','47481541','47481549','47481552','47481553','47481557','47481559','47481560','47481563','47395115','47395117','47395120','47395122','47395123','47395124','47395125','47395127','47395133','47395134','47395137','47395138','47395139','47395140','47395141','47395142','47395145','47395146','47395147','47395149','47395154','47395156','47395157','47395158','47395160','47395162','47395163','47481566','47481570','47481572','47481575','47481577','47395116','47481578','47481581','47481583','47481586')  ORDER BY gt_page_id,gt_id LIMIT 11;+------+-------------+----------+-------+-------------------------------+---------------+---------+------+---------+-------------+
| id   | select_type | table    | type  | possible_keys                 | key           | key_len | ref  | rows    | Extra       |
+------+-------------+----------+-------+-------------------------------+---------------+---------+------+---------+-------------+
|    1 | SIMPLE      | geo_tags | index | gt_page_primary,gt_page_id_id | gt_page_id_id | 8       | NULL | 2903422 | Using where |
+------+-------------+----------+-------+-------------------------------+---------------+---------+------+---------+-------------+

Similar queries, constructed in api modules via $this->addWhereFld( 'some_field', array_keys( $titles ) ), which all result in basically select ... from table where some_field IN (20+ page ids), don't seem to cause issues though so what's different here?

It looks like if i convert the expression from in to a boolean OR the explain looks more like i would expect with a series of point lookups:

wikiadmin@db1084(commonswiki)> explain SELECT  gt_id,gt_page_id,gt_lat,gt_lon,gt_primary,gt_globe,gt_type,gt_name  FROM `geo_tags`    WHERE (gt_page_id = 47481531 OR gt_page_id = 47481533 OR gt_page_id = 47481535 OR gt_page_id = 47481536 OR gt_page_id = 47481539 OR gt_page_id = 47481541 OR gt_page_id = 47481549 OR gt_page_id = 47481552 OR gt_page_id = 47481553 OR gt_page_id = 47481557 OR gt_page_id = 47481559 OR gt_page_id = 47481560 OR gt_page_id = 47481563 OR gt_page_id = 47395115 OR gt_page_id = 47395117 OR gt_page_id = 47395120 OR gt_page_id = 47395122 OR gt_page_id = 47395123 OR gt_page_id = 47395124 OR gt_page_id = 47395125 OR gt_page_id = 47395127 OR gt_page_id = 47395133 OR gt_page_id = 47395134 OR gt_page_id = 47395137 OR gt_page_id = 47395138 OR gt_page_id = 47395139 OR gt_page_id = 47395140 OR gt_page_id = 47395141 OR gt_page_id = 47395142 OR gt_page_id = 47395145 OR gt_page_id = 47395146 OR gt_page_id = 47395147 OR gt_page_id = 47395149 OR gt_page_id = 47395154 OR gt_page_id = 47395156 OR gt_page_id = 47395157 OR gt_page_id = 47395158 OR gt_page_id = 47395160 OR gt_page_id = 47395162 OR gt_page_id = 47395163 OR gt_page_id = 47481566 OR gt_page_id = 47481570 OR gt_page_id = 47481572 OR gt_page_id = 47481575 OR gt_page_id = 47481577 OR gt_page_id = 47395116 OR gt_page_id = 47481578 OR gt_page_id = 47481581 OR gt_page_id = 47481583 OR gt_page_id = 47481586)  AND (gt_page_id > 47481563 OR ( gt_page_id = 47481563 AND gt_id >= 185145944 ))  ORDER BY gt_page_id,gt_id LIMIT 11
+------+-------------+----------+-------+---------------------------------------+---------------+---------+------+------+-------------+
| id   | select_type | table    | type  | possible_keys                         | key           | key_len | ref  | rows | Extra       |
+------+-------------+----------+-------+---------------------------------------+---------------+---------+------+------+-------------+
|    1 | SIMPLE      | geo_tags | range | PRIMARY,gt_page_primary,gt_page_id_id | gt_page_id_id | 8       | NULL |   10 | Using where |
+------+-------------+----------+-------+---------------------------------------+---------------+---------+------+------+-------------+

I imagine this query plan could vary depending on the server, and various stats? Running an explain of the initial query (with the large IN condition) and the boolean query on analytics-store result in the same query plan:

mysql:research@analytics-store.eqiad.wmnet [commonswiki]> explain SELECT  gt_id,gt_page_id,gt_lat,gt_lon,gt_primary,gt_globe,gt_type,gt_name  FROM `geo_tags`    WHERE gt_page_id IN ('4748153
1','47481533','47481535','47481536','47481539','47481541','47481549','47481552','47481553','47481557','47481559','47481560','47481563','47395115','47395117','47395120','47395122','47395123',
'47395124','47395125','47395127','47395133','47395134','47395137','47395138','47395139','47395140','47395141','47395142','47395145','47395146','47395147','47395149','47395154','47395156','47
395157','47395158','47395160','47395162','47395163','47481566','47481570','47481572','47481575','47481577','47395116','47481578','47481581','47481583','47481586')  AND (gt_page_id > 47481563
 OR ( gt_page_id = 47481563 AND gt_id >= 185145944 ))  ORDER BY gt_page_id,gt_id LIMIT 11;                                                                                                    
+------+-------------+----------+-------+---------------------------------------+-----------------+---------+------+------+----------------------------------------------------+
| id   | select_type | table    | type  | possible_keys                         | key             | key_len | ref  | rows | Extra                                              |
+------+-------------+----------+-------+---------------------------------------+-----------------+---------+------+------+----------------------------------------------------+
|    1 | SIMPLE      | geo_tags | range | PRIMARY,gt_page_primary,gt_page_id_id | gt_page_primary | 4       | NULL |   10 | Using index condition; Using where; Using filesort |
+------+-------------+----------+-------+---------------------------------------+-----------------+---------+------+------+----------------------------------------------------+

mysql:research@analytics-store.eqiad.wmnet [commonswiki]> explain SELECT  gt_id,gt_page_id,gt_lat,gt_lon,gt_primary,gt_globe,gt_type,gt_name  FROM `geo_tags`    WHERE (gt_page_id = 47481531 OR gt_page_id = 47481533 OR gt_page_id = 47481535 OR gt_page_id = 47481536 OR gt_page_id = 47481539 OR gt_page_id = 47481541 OR gt_page_id = 47481549 OR gt_page_id = 47481552 OR gt_page_id = 47481553 OR gt_page_id = 47481557 OR gt_page_id = 47481559 OR gt_page_id = 47481560 OR gt_page_id = 47481563 OR gt_page_id = 47395115 OR gt_page_id = 47395117 OR gt_page_id = 47395120 OR gt_page_id = 47395122 OR gt_page_id = 47395123 OR gt_page_id = 47395124 OR gt_page_id = 47395125 OR gt_page_id = 47395127 OR gt_page_id = 47395133 OR gt_page_id = 47395134 OR gt_page_id = 47395137 OR gt_page_id = 47395138 OR gt_page_id = 47395139 OR gt_page_id = 47395140 OR gt_page_id = 47395141 OR gt_page_id = 47395142 OR gt_page_id = 47395145 OR gt_page_id = 47395146 OR gt_page_id = 47395147 OR gt_page_id = 47395149 OR gt_page_id = 47395154 OR gt_page_id = 47395156 OR gt_page_id = 47395157 OR gt_page_id = 47395158 OR gt_page_id = 47395160 OR gt_page_id = 47395162 OR gt_page_id = 47395163 OR gt_page_id = 47481566 OR gt_page_id = 47481570 OR gt_page_id = 47481572 OR gt_page_id = 47481575 OR gt_page_id = 47481577 OR gt_page_id = 47395116 OR gt_page_id = 47481578 OR gt_page_id = 47481581 OR gt_page_id = 47481583 OR gt_page_id = 47481586)  AND (gt_page_id > 47481563 OR ( gt_page_id = 47481563 AND gt_id >= 185145944 ))  ORDER BY gt_page_id,gt_id LIMIT 11
    -> ;
+------+-------------+----------+-------+---------------------------------------+-----------------+---------+------+------+----------------------------------------------------+
| id   | select_type | table    | type  | possible_keys                         | key             | key_len | ref  | rows | Extra                                              |
+------+-------------+----------+-------+---------------------------------------+-----------------+---------+------+------+----------------------------------------------------+
|    1 | SIMPLE      | geo_tags | range | PRIMARY,gt_page_primary,gt_page_id_id | gt_page_primary | 4       | NULL |   10 | Using index condition; Using where; Using filesort |
+------+-------------+----------+-------+---------------------------------------+-----------------+---------+------+------+----------------------------------------------------+

Checking a few different slaves, the estimated rows visited are:

databaseserver versionIN querybool query
analytics-store10.0.221010
db108110.1.33649535910
db108410.1.33290342210
db109110.1.345010
db109710.1.31611137010
db110310.1.31611935010

Version numbers of the servers seems to suggest there might be a regression after 10.0.22 that was partially fixed in 10.1.34. This isn't particularly actionable though.

How can we ensure the query is executed as a series of point lookups and not a table scan? Should mediawiki's query builder be adjusted to always issue boolean OR conditions instead of IN conditoins? Should only this single use case be changed?

is there some other particular change you are looking to see here?

I don't have any specific needs, except for it to be either fast or abort early. :-)

there might be a regression after 10.0.22 that was partially fixed in 10.1.34

I am not sure the correlation is real- it could be that the newest tables have been analyzed recently- For example analytics-store is an outlier because it doesn't use InnoDB but TokuDB, so it cannot be used as an example for actual query performance, just for data checks. I can upgrade one host and try again, though.

Normally the procedure we do is to try "fixing" it either on infra or by changing slightly the query, and if nothing works, add a FORCE INDEX() to assure a specific plan, assuming an optimal one is possible.

Let me upgrade one of the 10.1.31 to test your hypothesis and we can reevaluate with that data. If you could check in parallel if a FORCE / IGNORE is possible, that would be great.

Thank you very much for the quick response.

While I perform the upgrade- an example of changing the query whould be converting the OR/IN for a SELECT ... UNION, but this is just an example, any option you can think of, if the upgrade doesn't work, will be welcome. In general, for preference, infrastructure change > query change > index hints.

I think there is a mistake on your table, db1091 is running 10.0.34, not 10.1.34 (which is not released as of this date). I will run analyze on one other host instead.

Based on testing on codfw hosts, I believe this to be a regression from 10.0 to 10.1, which make more sense to me than a minor version regression. After analyze, I get this plan, so still not good:

[query execution]
10 rows in set (7.59 sec)

root@db2091.codfw.wmnet[commonswiki]> EXPLAIN SELECT  gt_id,gt_page_id,gt_lat,gt_lon,gt_primary,gt_globe,gt_type,gt_name  FROM `geo_tags`    WHERE gt_page_id IN ('47481531','47481533','47481535','47481536','47481539','47481541','47481549','47481552','47481553','47481557','47481559','47481560','47481563','47395115','47395117','47395120','47395122','47395123','47395124','47395125','47395127','47395133','47395134','47395137','47395138','47395139','47395140','47395141','47395142','47395145','47395146','47395147','47395149','47395154','47395156','47395157','47395158','47395160','47395162','47395163','47481566','47481570','47481572','47481575','47481577','47395116','47481578','47481581','47481583','47481586')  AND (gt_page_id > 47481563 OR ( gt_page_id = 47481563 AND gt_id >= 185145944 ))  ORDER BY gt_page_id,gt_id LIMIT 11;
+------+-------------+----------+-------+---------------------------------------+---------------+---------+------+---------+---------
| id   | select_type | table    | type  | possible_keys                         | key           | key_len | ref  | rows    | Extra   
+------+-------------+----------+-------+---------------------------------------+---------------+---------+------+---------+---------
|    1 | SIMPLE      | geo_tags | range | PRIMARY,gt_page_primary,gt_page_id_id | gt_page_id_id | 8       | NULL | 6086390 | Using wh
+------+-------------+----------+-------+---------------------------------------+---------------+---------+------+---------+---------
1 row in set (0.03 sec)

So to clarify, this is an infrastructure issue (mysql regression), but I am not sure we can work around it without code changes- 10.1 is the target version as of now. Could we apply the changes you suggested or, if there is no other option, an IGNORE INDEX?

[query execution]
10 rows in set (0.03 sec)

root@db2091.codfw.wmnet[commonswiki]> EXPLAIN SELECT  gt_id,gt_page_id,gt_lat,gt_lon,gt_primary,gt_globe,gt_type,gt_name  FROM `geo_tags` IGNORE INDEX(gt_page_id_id)   WHERE gt_page_id IN ('47481531','47481533','47481535','47481536','47481539','47481541','47481549','47481552','47481553','47481557','47481559','47481560','47481563','47395115','47395117','47395120','47395122','47395123','47395124','47395125','47395127','47395133','47395134','47395137','47395138','47395139','47395140','47395141','47395142','47395145','47395146','47395147','47395149','47395154','47395156','47395157','47395158','47395160','47395162','47395163','47481566','47481570','47481572','47481575','47481577','47395116','47481578','47481581','47481583','47481586')  AND (gt_page_id > 47481563 OR ( gt_page_id = 47481563 AND gt_id >= 185145944 ))  ORDER BY gt_page_id,gt_id LIMIT 11;
+------+-------------+----------+-------+-------------------------+-----------------+---------+------+------+------------------------
| id   | select_type | table    | type  | possible_keys           | key             | key_len | ref  | rows | Extra                  
+------+-------------+----------+-------+-------------------------+-----------------+---------+------+------+------------------------
|    1 | SIMPLE      | geo_tags | range | PRIMARY,gt_page_primary | gt_page_primary | 4       | NULL |   10 | Using index condition; 
+------+-------------+----------+-------+-------------------------+-----------------+---------+------+------+------------------------
1 row in set (0.03 sec)

Last question- does the query really make sense? It seems overly complicated for no reason with conditions that cannot be met -gt_page_id IN ('47481531') AND (gt_page_id > 47481563 OR ( gt_page_id = 47481563... , and the fact that uses limit to certain number of results makes it nondeterministic in execution time- many results, faster; few results, slower.

Change 439779 had a related patch set uploaded (by EBernhardson; owner: EBernhardson):
[mediawiki/extensions/GeoData@master] Adjust ApiQueryCoordinates sql query performance

https://gerrit.wikimedia.org/r/439779

Based on testing on codfw hosts, I believe this to be a regression from 10.0 to 10.1, which make more sense to me than a minor version regression. After analyze, I get this plan, so still not good:

[query execution]
10 rows in set (7.59 sec)

root@db2091.codfw.wmnet[commonswiki]> EXPLAIN SELECT  gt_id,gt_page_id,gt_lat,gt_lon,gt_primary,gt_globe,gt_type,gt_name  FROM `geo_tags`    WHERE gt_page_id IN ('47481531','47481533','47481535','47481536','47481539','47481541','47481549','47481552','47481553','47481557','47481559','47481560','47481563','47395115','47395117','47395120','47395122','47395123','47395124','47395125','47395127','47395133','47395134','47395137','47395138','47395139','47395140','47395141','47395142','47395145','47395146','47395147','47395149','47395154','47395156','47395157','47395158','47395160','47395162','47395163','47481566','47481570','47481572','47481575','47481577','47395116','47481578','47481581','47481583','47481586')  AND (gt_page_id > 47481563 OR ( gt_page_id = 47481563 AND gt_id >= 185145944 ))  ORDER BY gt_page_id,gt_id LIMIT 11;
+------+-------------+----------+-------+---------------------------------------+---------------+---------+------+---------+---------
| id   | select_type | table    | type  | possible_keys                         | key           | key_len | ref  | rows    | Extra   
+------+-------------+----------+-------+---------------------------------------+---------------+---------+------+---------+---------
|    1 | SIMPLE      | geo_tags | range | PRIMARY,gt_page_primary,gt_page_id_id | gt_page_id_id | 8       | NULL | 6086390 | Using wh
+------+-------------+----------+-------+---------------------------------------+---------------+---------+------+---------+---------
1 row in set (0.03 sec)

So to clarify, this is an infrastructure issue (mysql regression), but I am not sure we can work around it without code changes- 10.1 is the target version as of now. Could we apply the changes you suggested or, if there is no other option, an IGNORE INDEX?

Based on my testing, it looks like switching to the boolean OR should solve this. I've uploaded a patch to gerrit that will make this change only in geodata. This isn't the first time i've seen problems with the IN condition though, and probably wont be the last. Should we be pushing for a lower level change of mediawiki's query building to always generate OR conditions instead of IN?

[query execution]
10 rows in set (0.03 sec)

root@db2091.codfw.wmnet[commonswiki]> EXPLAIN SELECT  gt_id,gt_page_id,gt_lat,gt_lon,gt_primary,gt_globe,gt_type,gt_name  FROM `geo_tags` IGNORE INDEX(gt_page_id_id)   WHERE gt_page_id IN ('47481531','47481533','47481535','47481536','47481539','47481541','47481549','47481552','47481553','47481557','47481559','47481560','47481563','47395115','47395117','47395120','47395122','47395123','47395124','47395125','47395127','47395133','47395134','47395137','47395138','47395139','47395140','47395141','47395142','47395145','47395146','47395147','47395149','47395154','47395156','47395157','47395158','47395160','47395162','47395163','47481566','47481570','47481572','47481575','47481577','47395116','47481578','47481581','47481583','47481586')  AND (gt_page_id > 47481563 OR ( gt_page_id = 47481563 AND gt_id >= 185145944 ))  ORDER BY gt_page_id,gt_id LIMIT 11;
+------+-------------+----------+-------+-------------------------+-----------------+---------+------+------+------------------------
| id   | select_type | table    | type  | possible_keys           | key             | key_len | ref  | rows | Extra                  
+------+-------------+----------+-------+-------------------------+-----------------+---------+------+------+------------------------
|    1 | SIMPLE      | geo_tags | range | PRIMARY,gt_page_primary | gt_page_primary | 4       | NULL |   10 | Using index condition; 
+------+-------------+----------+-------+-------------------------+-----------------+---------+------+------+------------------------
1 row in set (0.03 sec)

Last question- does the query really make sense? It seems overly complicated for no reason with conditions that cannot be met -gt_page_id IN ('47481531') AND (gt_page_id > 47481563 OR ( gt_page_id = 47481563... , and the fact that uses limit to certain number of results makes it nondeterministic in execution time- many results, faster; few results, slower.

This could certainly be refactored to be more direct, but is essentially a consequence of two generic features combining. One generic feature feeds this code a set of titles to operate on from some other source. The other generic feature is paginating over results. When the two are used in combination this code could probably be made smart enough to pre-filter rather than pushing it all into the query and letting mysql figure it out.

Should we be pushing for a lower level change of mediawiki's query building to always generate OR conditions instead of IN?

There are some bugs with IN on older mariadb releases. I would not touch anything at mediawiki layer yet because we need to evaluate MySQL/MariaDB latest releases to see if performance improves without any changes.

However, if you could rewrite the specific query to make it faster now (only this case), that would help a lot, thanks!

Looking at the query in question, gt_page_id_id is the best index for it to use. I have no idea how it's planning on using it to get the broken plan we're seeing here.

To confuse the issue even more, if we force it to use the index it's already using then it suddenly starts do be non-stupid:

wikiadmin@db1091(commonswiki)>explain SELECT  gt_id,gt_page_id,gt_lat,gt_lon,gt_primary,gt_globe,gt_type,gt_name  FROM `geo_tags` use index(gt_page_id_id)   WHERE gt_page_id IN ('47481531','47481533','47481535','47481536','47481539','47481541','47481549','47481552','47481553','47481557','47481559','47481560','47481563','47395115','47395117','47395120','47395122','47395123','47395124','47395125','47395127','47395133','47395134','47395137','47395138','47395139','47395140','47395141','47395142','47395145','47395146','47395147','47395149','47395154','47395156','47395157','47395158','47395160','47395162','47395163','47481566','47481570','47481572','47481575','47481577','47395116','47481578','47481581','47481583','47481586') AND (gt_page_id > 47481563 OR ( gt_page_id = 47481563 AND gt_id >= 185145944 )) ORDER BY gt_page_id,gt_id limit 11;
+------+-------------+----------+-------+---------------+---------------+---------+------+------+------------------------------------+
| id   | select_type | table    | type  | possible_keys | key           | key_len | ref  | rows | Extra                              |
+------+-------------+----------+-------+---------------+---------------+---------+------+------+------------------------------------+
|    1 | SIMPLE      | geo_tags | range | gt_page_id_id | gt_page_id_id | 8       | NULL |   10 | Using index condition; Using where |
+------+-------------+----------+-------+---------------+---------------+---------+------+------+------------------------------------+

Or even more confusing, if we ignore one specific index that it's not using anyway that seems to work too:

wikiadmin@db1091(commonswiki)>explain SELECT  gt_id,gt_page_id,gt_lat,gt_lon,gt_primary,gt_globe,gt_type,gt_name  FROM `geo_tags` ignore index(gt_page_primary)   WHERE gt_page_id IN ('47481531','47481533','47481535','47481536','47481539','47481541','47481549','47481552','47481553','47481557','47481559','47481560','47481563','47395115','47395117','47395120','47395122','47395123','47395124','47395125','47395127','47395133','47395134','47395137','47395138','47395139','47395140','47395141','47395142','47395145','47395146','47395147','47395149','47395154','47395156','47395157','47395158','47395160','47395162','47395163','47481566','47481570','47481572','47481575','47481577','47395116','47481578','47481581','47481583','47481586') AND (gt_page_id > 47481563 OR ( gt_page_id = 47481563 AND gt_id >= 185145944 )) ORDER BY gt_page_id,gt_id limit 11;
+------+-------------+----------+-------+-----------------------+---------------+---------+------+------+------------------------------------+
| id   | select_type | table    | type  | possible_keys         | key           | key_len | ref  | rows | Extra                              |
+------+-------------+----------+-------+-----------------------+---------------+---------+------+------+------------------------------------+
|    1 | SIMPLE      | geo_tags | range | PRIMARY,gt_page_id_id | gt_page_id_id | 8       | NULL |   10 | Using index condition; Using where |
+------+-------------+----------+-------+-----------------------+---------------+---------+------+------+------------------------------------+

@EBernhardson Sorry to intervene, but wondering if this is still blocked, and if so, what is it blocked on?

Last i looked into there there was still indecision about what to actually do. @jcrespo asked to modified the query, or add a force index as a last result. @Anomie is asking in the gerrit patch for force index, but more specifically in T197486 he investigated what looks like the same underlying issue. At the time i moved it into blocked that discussion was still ongoing about proper solutions.

I don't have a particular preference for which of the two solutions goes into the patch. Someone tell me what they will +2 and i'll modify it.

Tagging Platform Engineering to make sure it's on the radar, given this seems to be blocked on agreement for a solution before @EBernhardson can proceed.

Looks like that patch is no longer needed, as the upgrade to MariaDB 10.1.37 seems to have fixed it.

wikiadmin@10.64.48.150(commonswiki)> explain SELECT  gt_id,gt_page_id,gt_lat,gt_lon,gt_primary,gt_globe,gt_type,gt_name  FROM `geo_tags`  WHERE gt_page_id IN ('47481531','47481533','47481535','47481536','47481539','47481541','47481549','47481552','47481553','47481557','47481559','47481560','47481563','47395115','47395117','47395120','47395122','47395123','47395124','47395125','47395127','47395133','47395134','47395137','47395138','47395139','47395140','47395141','47395142','47395145','47395146','47395147','47395149','47395154','47395156','47395157','47395158','47395160','47395162','47395163','47481566','47481570','47481572','47481575','47481577','47395116','47481578','47481581','47481583','47481586') AND (gt_page_id > 47481563 OR ( gt_page_id = 47481563 AND gt_id >= 185145944 )) ORDER BY gt_page_id,gt_id limit 11;
+------+-------------+----------+-------+---------------------------------------+---------------+---------+------+------+-------------+
| id   | select_type | table    | type  | possible_keys                         | key           | key_len | ref  | rows | Extra       |
+------+-------------+----------+-------+---------------------------------------+---------------+---------+------+------+-------------+
|    1 | SIMPLE      | geo_tags | range | PRIMARY,gt_page_primary,gt_page_id_id | gt_page_id_id | 8       | NULL |   10 | Using where |
+------+-------------+----------+-------+---------------------------------------+---------------+---------+------+------+-------------+

See T197486#4749182 for details.

Good. Sadly, MariaDB support is not great and identifying several bug reports and close them after being fixed because they were reported independently, so it is difficult to track those. However, we have seen lately a trend of better query plans on the later minor versions of 10.1 (based on logging monitoring). For the most part, db-related errors lately seem to be fixable on mw rather than bad query plans. Better HW and optimized tables may have helped, too.

Not seen on Logstash for 30 days.

Query
_type: "mediawiki"
channel.raw: "error, exception, fatal, DBQuery"
--
+message:(ApiQueryCoordinates OR GeoData OR "geo_tags")

Change 439779 abandoned by EBernhardson:
Adjust ApiQueryCoordinates sql query performance

Reason:
mariadb upgrade seems to have fixed query planning.

https://gerrit.wikimedia.org/r/439779

mmodell changed the subtype of this task from "Task" to "Production Error".Aug 28 2019, 11:09 PM