Page MenuHomePhabricator

AFComputedVariable::compute query timeouts
Closed, ResolvedPublic

Description

Lots of these in the logs at https://logstash.wikimedia.org/#/dashboard/elasticsearch/wfLogDBError:

AFComputedVariable::compute	10.64.32.21	2013	Lost connection to MySQL server during query (10.64.32.21)	SELECT rev_user_text FROM (SELECT  rev_user_text,rev_timestamp  FROM `revision`   WHERE rev_page = '8972734' AND (rev_timestamp < '20151025233104')  ORDER BY rev_timestamp DESC LIMIT 100  ) AS tmp GROUP BY rev_user_text ORDER BY MAX(rev_timestamp) DESC LIMIT 10

Seems to affect the enwiki API DBs (not sure if that is page/traffic pattern or something with those DBs). The query plan on those looks similar to other servers *except* for the index "key length" used and ref vs range scans. The ones with less errors say 4 (vs 20 for the API ones).

> aaron@terbium:~$ mwscript sql.php --wiki=enwiki --slave=db1065
> EXPLAIN SELECT rev_user_text FROM (SELECT  rev_user_text,rev_timestamp  FROM `revision`   WHERE rev_page = '36395484' AND (rev_timestamp < '20151025072035')  ORDER BY rev_timestamp DESC LIMIT 100  ) AS tmp GROUP BY rev_user_text ORDER BY MAX(rev_timestamp) DESC LIMIT 10;
stdClass Object
(
    [id] => 1
    [select_type] => PRIMARY
    [table] => <derived2>
    [type] => ALL
    [possible_keys] => 
    [key] => 
    [key_len] => 
    [ref] => 
    [rows] => 100
    [Extra] => Using temporary; Using filesort
)
stdClass Object
(
    [id] => 2
    [select_type] => DERIVED
    [table] => revision
    [type] => range
    [possible_keys] => rev_page_id,rev_timestamp,page_timestamp
    [key] => page_timestamp
    [key_len] => 20
    [ref] => 
    [rows] => 173152
    [Extra] => Using where
)
> aaron@terbium:~$ mwscript sql.php --wiki=enwiki --slave=db1073
> EXPLAIN SELECT rev_user_text FROM (SELECT  rev_user_text,rev_timestamp  FROM `revision`   WHERE rev_page = '36395484' AND (rev_timestamp < '20151025072035')  ORDER BY rev_timestamp DESC LIMIT 100  ) AS tmp GROUP BY rev_user_text ORDER BY MAX(rev_timestamp) DESC LIMIT 10;
stdClass Object
(
    [id] => 1
    [select_type] => PRIMARY
    [table] => <derived2>
    [type] => ALL
    [possible_keys] => 
    [key] => 
    [key_len] => 
    [ref] => 
    [rows] => 100
    [Extra] => Using temporary; Using filesort
)
stdClass Object
(
    [id] => 2
    [select_type] => DERIVED
    [table] => revision
    [type] => ref
    [possible_keys] => rev_page_id,rev_timestamp,page_timestamp
    [key] => page_timestamp
    [key_len] => 4
    [ref] => const
    [rows] => 189680
    [Extra] => Using where
)

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
aaron claimed this task.Oct 26 2015, 6:46 PM

Change 248989 had a related patch set uploaded (by Aaron Schulz):
Avoid pointless range scan for 'load-recent-authors'

https://gerrit.wikimedia.org/r/248989

Change 249032 had a related patch set uploaded (by Ori.livneh):
Avoid pointless range scan for 'load-recent-authors'

https://gerrit.wikimedia.org/r/249032

Change 248989 merged by jenkins-bot:
Avoid pointless range scan for 'load-recent-authors'

https://gerrit.wikimedia.org/r/248989

Change 249032 merged by jenkins-bot:
Avoid pointless range scan for 'load-recent-authors'

https://gerrit.wikimedia.org/r/249032

jcrespo moved this task from Triage to Backlog on the DBA board.Oct 27 2015, 9:07 PM

I've just activated profiling on db1065, it will take some time to have proper results, but the query has already been detected:

MariaDB PRODUCTION s1 localhost sys > SELECT * FROM sys.statement_analysis WHERE digest = 'd993cce8555e342b2fcbf039c54d6ec3'\G
*************************** 1. row ***************************
            query: SELECT rev_user_text FROM ( SE ...  rev_timestamp ) DESC LIMIT ? 
               db: enwiki
        full_scan: *
       exec_count: 714
        err_count: 0
       warn_count: 0
    total_latency: 12.62 s
      max_latency: 313.92 ms
      avg_latency: 17.68 ms
     lock_latency: 48.14 ms
        rows_sent: 2163
    rows_sent_avg: 3
    rows_examined: 0
rows_examined_avg: 0
    rows_affected: 0
rows_affected_avg: 0
       tmp_tables: 1428
  tmp_disk_tables: 0
      rows_sorted: 3208
sort_merge_passes: 0
           digest: d993cce8555e342b2fcbf039c54d6ec3
       first_seen: 2015-10-29 10:41:48
        last_seen: 2015-10-29 11:20:51
1 row in set (0.01 sec)

So there are several problems here:

  • A temporary table is created
  • A filesort process is needed
  • 160K+ rows are read (it could be more for other queries, and those could be on disk). Even if it is using an index, that is a lot of rows.
MariaDB PRODUCTION s1 localhost sys > use enwiki; FLUSH STATUS; SELECT rev_user_text FROM (SELECT rev_user_text,rev_timestamp FROM `revision` WHERE rev_page = '11238105' ORDER BY rev_timestamp DESC LIMIT 100 ) AS tmp GROUP BY rev_user_text ORDER BY MAX(rev_timestamp) DESC LIMIT 10; SHOW STATUS like 'Hand%';
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
Query OK, 0 rows affected (0.00 sec)

+---------------+
| rev_user_text |
+---------------+
| DeltaQuadBot  |
| Ymblanter     |
| Diannaa       |
| The Anome     |
+---------------+
4 rows in set (2.38 sec)

+----------------------------+--------+
| Variable_name              | Value  |
+----------------------------+--------+
| Handler_commit             | 1      |
| Handler_delete             | 0      |
| Handler_discover           | 0      |
| Handler_external_lock      | 0      |
| Handler_icp_attempts       | 0      |
| Handler_icp_match          | 0      |
| Handler_mrr_init           | 0      |
| Handler_mrr_key_refills    | 0      |
| Handler_mrr_rowid_refills  | 0      |
| Handler_prepare            | 0      |
| Handler_read_first         | 0      |
| Handler_read_key           | 101    |
| Handler_read_last          | 0      |
| Handler_read_next          | 163186 |
| Handler_read_prev          | 0      |
| Handler_read_rnd           | 4      |
| Handler_read_rnd_deleted   | 0      |
| Handler_read_rnd_next      | 106    |
| Handler_rollback           | 0      |
| Handler_savepoint          | 0      |
| Handler_savepoint_rollback | 0      |
| Handler_tmp_update         | 96     |
| Handler_tmp_write          | 104    |
| Handler_update             | 0      |
| Handler_write              | 0      |
+----------------------------+--------+
25 rows in set (0.00 sec)

MariaDB PRODUCTION s1 localhost enwiki > SHOW STATUS like 'sort%';
+---------------------------+-------+
| Variable_name             | Value |
+---------------------------+-------+
| Sort_merge_passes         | 0     |
| Sort_priority_queue_sorts | 1     |
| Sort_range                | 1     |
| Sort_rows                 | 104   |
| Sort_scan                 | 1     |
+---------------------------+-------+
5 rows in set (0.00 sec)

MariaDB PRODUCTION s1 localhost enwiki > SHOW STATUS like '%tmp%';
+-------------------------+-------+
| Variable_name           | Value |
+-------------------------+-------+
| Created_tmp_disk_tables | 0     |
| Created_tmp_files       | 27    |
| Created_tmp_tables      | 2     |
| Handler_tmp_update      | 96    |
| Handler_tmp_write       | 104   |
| Rows_tmp_read           | 216   |
+-------------------------+-------+
6 rows in set (0.00 sec)

I have to tweak the profiling to catch the 4 minute queries:

max_latency: 4.71 m

The reason why they are not 4 minutes while executed ad-hoc is that they have already been cached on memory, and is performance increases a lot in that case.

Aside from rewriting the query, maybe the partitioning on the recentchanges slaves may help? Please compare the query stats on those servers (db1051 and db1055). Those are special slaves with slightly different schema to help certain slow queries:

+----------------------------+-------+
| Variable_name              | Value |
+----------------------------+-------+
| Handler_commit             | 1     |
| Handler_delete             | 0     |
| Handler_discover           | 0     |
| Handler_external_lock      | 0     |
| Handler_icp_attempts       | 0     |
| Handler_icp_match          | 0     |
| Handler_mrr_init           | 0     |
| Handler_mrr_key_refills    | 0     |
| Handler_mrr_rowid_refills  | 0     |
| Handler_prepare            | 0     |
| Handler_read_first         | 0     |
| Handler_read_key           | 123   |
| Handler_read_last          | 0     |
| Handler_read_next          | 0     |
| Handler_read_prev          | 99    |
| Handler_read_rnd           | 4     |
| Handler_read_rnd_deleted   | 0     |
| Handler_read_rnd_next      | 106   |
| Handler_rollback           | 0     |
| Handler_savepoint          | 0     |
| Handler_savepoint_rollback | 0     |
| Handler_tmp_update         | 96    |
| Handler_tmp_write          | 104   |
| Handler_update             | 0     |
| Handler_write              | 0     |
+----------------------------+-------+
25 rows in set (0.00 sec)
jcrespo moved this task from Backlog to In progress on the DBA board.Oct 29 2015, 12:21 PM

To make it explicit, my current recommendation is to perform a small test that executes this query on a 'recentchanges' group slave and confirm that this effectively fixes the timeouts and the slow writes.

If the test is positive, maybe we can create a special group for this query or deploy partitioning to more slaves.

jcrespo moved this task from In progress to Backlog on the DBA board.Nov 5 2015, 5:16 PM
aaron added a comment.Dec 18 2015, 9:16 PM

Last day of errors (58) is only db1065 and db1066 (evenly).

We could experiment with a query group, though this seems like the kind of boring query that should be able to run everywhere quickly.

Change 260634 had a related patch set uploaded (by Aaron Schulz):
Simplify "load-recent-authors" query

https://gerrit.wikimedia.org/r/260634

Change 260634 merged by jenkins-bot:
Simplify "load-recent-authors" query

https://gerrit.wikimedia.org/r/260634

aaron lowered the priority of this task from High to Normal.Jan 15 2016, 10:46 PM

Change 266644 had a related patch set uploaded (by Aaron Schulz):
Add ParserOutputStashForEdit hook for extension cache warming

https://gerrit.wikimedia.org/r/266644

Change 266646 had a related patch set uploaded (by Aaron Schulz):
Compute last-recent-authors result during edit stashing

https://gerrit.wikimedia.org/r/266646

Change 267137 had a related patch set uploaded (by Ori.livneh):
Compute last-recent-authors result during edit stashing

https://gerrit.wikimedia.org/r/267137

Change 267139 had a related patch set uploaded (by Ori.livneh):
Compute last-recent-authors result during edit stashing

https://gerrit.wikimedia.org/r/267139

Change 266646 merged by jenkins-bot:
Compute last-recent-authors result during edit stashing

https://gerrit.wikimedia.org/r/266646

Change 267137 merged by jenkins-bot:
Compute last-recent-authors result during edit stashing

https://gerrit.wikimedia.org/r/267137

Change 267177 had a related patch set uploaded (by Ori.livneh):
Add ParserOutputStashForEdit hook for extension cache warming

https://gerrit.wikimedia.org/r/267177

Change 266644 merged by jenkins-bot:
Add ParserOutputStashForEdit hook for extension cache warming

https://gerrit.wikimedia.org/r/266644

Change 267177 merged by Ori.livneh:
Add ParserOutputStashForEdit hook for extension cache warming

https://gerrit.wikimedia.org/r/267177

Change 267139 abandoned by Krinkle:
Compute last-recent-authors result during edit stashing

https://gerrit.wikimedia.org/r/267139

aaron removed aaron as the assignee of this task.Mar 17 2016, 10:57 PM

Only seeing 13 in the last day, so this is less common.

Krinkle added a subscriber: Krinkle.Dec 3 2016, 7:01 AM

https://logstash.wikimedia.org/goto/430429f471afeb24a32f58c659f57cf1 (mediawiki-errors AFComputedVariable)

Between 50-100 per day. @aaron So, the underlying problem is that the query is too slow? Or is it caused by something else?

https://logstash.wikimedia.org/goto/9b00a481c72faee815a075b28043af4b.
Still upto 100 exceptions per day from this code path for database connection lost during query.

AFComputedVariable::{closure}	errno=2013	Lost connection to MySQL server during query
SELECT rev_user_text FROM revision WHERE rev_page = %  ORDER BY rev_timestamp DESC LIMIT 100

@aaron So, the underlying problem is that the query is too slow? Or is it caused by something else?

jcrespo raised the priority of this task from Normal to High.Apr 19 2017, 7:04 PM
jcrespo added a subscriber: Marostegui.

I think this got worse on failover- the query planer gets confused between page_user_timestamp and page_timestamp, they are too similar indexes, but the wrong one causes a fileshort (and even worse performance when a page has many revisions).

Normally running ANALYZE fixes it, but we are low on resources on codfw, and it takes right now 3 days to run it (being a blocking statement). We should either put an IGNORE statement there to force the plan or send it to an rc slave, as I suggested earlier. We can do proper fixes later, but this needs a workaround quickly: https://logstash.wikimedia.org/goto/59d22f141508612b4e4bb32857406e0a

This is interesting:

root@db2070.codfw.wmnet[enwiki]> EXPLAIN SELECT /* AFComputedVariable::{closure}  */  rev_user_text  FROM `revision` WHERE rev_page = '34681'  ORDER BY rev_timestamp DESC LIMIT 100\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: revision
         type: ref
possible_keys: page_timestamp,page_user_timestamp,rev_page_id
          key: page_timestamp
      key_len: 4
          ref: const
         rows: 5088
        Extra: Using where
1 row in set (0.04 sec)

root@db2070.codfw.wmnet[enwiki]> EXPLAIN SELECT /* AFComputedVariable::{closure}  */  rev_user_text  FROM `revision` WHERE rev_page = '1772647'  ORDER BY rev_timestamp DESC LIMIT 100\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: revision
         type: ref
possible_keys: page_timestamp,page_user_timestamp,rev_page_id
          key: page_user_timestamp
      key_len: 4
          ref: const
         rows: 985
        Extra: Using where; Using filesort
1 row in set (0.04 sec)

root@db2070.codfw.wmnet[enwiki]> pager cat > /dev/null; SELECT /* AFComputedVariable::{closure}  */  rev_user_text  FROM `revision` WHERE rev_page = '34681'  ORDER BY rev_timestamp DESC LIMIT 100;
PAGER set to 'cat > /dev/null'
100 rows in set (0.05 sec)

root@db2070.codfw.wmnet[enwiki]> SELECT /* AFComputedVariable::{closure}  */  rev_user_text  FROM `revision` WHERE rev_page = '1772647'  ORDER BY rev_timestamp DESC LIMIT 100;
100 rows in set (0.10 sec)
jcrespo added a comment.EditedApr 21 2017, 8:00 AM

So, I say we force the index or send it to the special slaves- this is not a good or long term solution, but something we can do now to minimize issue while we investigate if it is a problem with the mariadb version or our config, or we need better calculated index statistics.

Sorry, I need help, I do not know where this query come from on the code (I cannot find it).

Change 349388 had a related patch set uploaded (by Jcrespo):
[operations/mediawiki-config@master] Depool db2055 for maintenance

https://gerrit.wikimedia.org/r/349388

Change 349388 merged by jenkins-bot:
[operations/mediawiki-config@master] MariaDB: Depool db2062 database server for maintenance

https://gerrit.wikimedia.org/r/349388

Mentioned in SAL (#wikimedia-operations) [2017-04-21T09:10:20Z] <jynus> stopping and upgrading/reconfiguring db2062 (depooled) T116557

This is madness:

root@db2062[enwiki]> EXPLAIN SELECT /* AFComputedVariable::{closure}  */  rev_user_text  FROM `revision` WHERE rev_page = '34681'  ORDER BY rev_timestamp DESC LIMIT 100\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: revision
         type: ref
possible_keys: page_timestamp,page_user_timestamp,rev_page_id
          key: page_timestamp
      key_len: 4
          ref: const
         rows: 5097
        Extra: Using where
1 row in set (0.09 sec)

root@db2062[enwiki]> EXPLAIN SELECT /* AFComputedVariable::{closure}  */  rev_user_text  FROM `revision` WHERE rev_page = '1772647'  ORDER BY rev_timestamp DESC LIMIT 100\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: revision
         type: ref
possible_keys: page_timestamp,page_user_timestamp,rev_page_id
          key: page_timestamp
      key_len: 4
          ref: const
         rows: 989
        Extra: Using where
1 row in set (0.06 sec)
root@db2062.codfw.wmnet[enwiki]> EXPLAIN SELECT /* AFComputedVariable::{closure}  */  rev_user_text  FROM `revision`    WHERE rev_page = '1743794'   ORDER BY rev_timestamp DESC LIMIT 100\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: revision
         type: ref
possible_keys: page_timestamp,page_user_timestamp,rev_page_id
          key: page_user_timestamp
      key_len: 4
          ref: const
         rows: 128600
        Extra: Using where; Using filesort
1 row in set (0.03 sec)

Mentioned in SAL (#wikimedia-operations) [2017-04-21T12:07:57Z] <marostegui> Analyze revision, logging and page table on s1 db1080 - T116557

Mentioned in SAL (#wikimedia-operations) [2017-04-21T14:32:40Z] <marostegui> Analyze revision, logging and page table on s1 db1067 - https://phabricator.wikimedia.org/T116557

root@db2062[enwiki]> ANALYZE TABLE revision;                                                             
+-----------------+---------+----------+-----------------------------------------+                       
| Table           | Op      | Msg_type | Msg_text                                |                       
+-----------------+---------+----------+-----------------------------------------+                       
| enwiki.revision | analyze | status   | Engine-independent statistics collected |                       
| enwiki.revision | analyze | status   | OK                                      |                       
+-----------------+---------+----------+-----------------------------------------+                       
2 rows in set (1 day 30 min 44.40 sec)

However:

root@db2062[enwiki]> EXPLAIN SELECT /* AFComputedVariable::{closure}  */  rev_user_text  FROM `revision`    WHERE rev_page = '1743794'   ORDER BY rev_timestamp DESC LIMIT 100\G
ERROR 2006 (HY000): MySQL server has gone away
No connection. Trying to reconnect...
Connection id:    133531
Current database: enwiki

*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: revision
         type: ref
possible_keys: page_timestamp,page_user_timestamp,rev_page_id
          key: page_user_timestamp
      key_len: 4
          ref: const
         rows: 128600
        Extra: Using where; Using filesort
1 row in set (0.05 sec)

So this now 100% repeatable -probably an optimizer bug, and probably https://jira.mariadb.org/browse/MDEV-6402 or https://jira.mariadb.org/browse/MDEV-6657 (Sadly, only fixed on 10.1). I am going to confirm it.

Mentioned in SAL (#wikimedia-operations) [2017-04-23T17:49:47Z] <jynus> disabling puppet on db2062 and upgrading MariaDB package to 10.1 T116557

Nope, it doesn't work either:

root@db2062[enwiki]> SELECT version();
+-----------------+
| version()       |
+-----------------+
| 10.1.22-MariaDB |
+-----------------+
1 row in set (0.00 sec)

root@db2062[enwiki]> EXPLAIN SELECT /* AFComputedVariable::{closure}  */  rev_user_text  FROM `revision`    WHERE rev_page = '1743794'   ORDER BY rev_timestamp DESC LIMIT 100\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: revision
         type: ref
possible_keys: page_timestamp,page_user_timestamp,rev_page_id
          key: page_user_timestamp
      key_len: 4
          ref: const
         rows: 128600
        Extra: Using where; Using filesort
1 row in set (0.13 sec)
jcrespo added a subscriber: Anomie.Apr 23 2017, 6:03 PM

CC @Anomie because I think he was hoping 10.1 would solve this or very similar issues, but it doesn't seem to be the case. :-/

root@db2062[enwiki]> ANALYZE TABLE revision;                                                             
+-----------------+---------+----------+-----------------------------------------+                       
| Table           | Op      | Msg_type | Msg_text                                |                       
+-----------------+---------+----------+-----------------------------------------+                       
| enwiki.revision | analyze | status   | Engine-independent statistics collected |                       
| enwiki.revision | analyze | status   | OK                                      |                       
+-----------------+---------+----------+-----------------------------------------+                       
2 rows in set (1 day 30 min 44.40 sec)

However:

root@db2062[enwiki]> EXPLAIN SELECT /* AFComputedVariable::{closure}  */  rev_user_text  FROM `revision`    WHERE rev_page = '1743794'   ORDER BY rev_timestamp DESC LIMIT 100\G
ERROR 2006 (HY000): MySQL server has gone away
No connection. Trying to reconnect...
Connection id:    133531
Current database: enwiki
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: revision
         type: ref
possible_keys: page_timestamp,page_user_timestamp,rev_page_id
          key: page_user_timestamp
      key_len: 4
          ref: const
         rows: 128600
        Extra: Using where; Using filesort
1 row in set (0.05 sec)

So this now 100% repeatable -probably an optimizer bug, and probably https://jira.mariadb.org/browse/MDEV-6402 or https://jira.mariadb.org/browse/MDEV-6657 (Sadly, only fixed on 10.1). I am going to confirm it.

And for some reason db1080 and db1067 do work after the analyze, they do not filesort anymore:

root@db1080[enwiki]> EXPLAIN SELECT /* AFComputedVariable::{closure}  */  rev_user_text  FROM `revision`    WHERE rev_page = '17437194'   ORDER BY rev_timestamp DESC LIMIT 100\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: revision
         type: ref
possible_keys: page_timestamp,page_user_timestamp,rev_page_id
          key: page_timestamp
      key_len: 4
          ref: const
         rows: 2
        Extra: Using where
1 row in set (0.00 sec)

root@db1067[enwiki]> EXPLAIN SELECT /* AFComputedVariable::{closure}  */  rev_user_text  FROM `revision`    WHERE rev_page = '17437194'   ORDER BY rev_timestamp DESC LIMIT 100\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: revision
         type: ref
possible_keys: rev_page_id,page_timestamp,page_user_timestamp
          key: page_timestamp
      key_len: 4
          ref: const
         rows: 2
        Extra: Using where
1 row in set (0.01 sec)

I don't see that:

root@db1080[enwiki]> EXPLAIN SELECT /* AFComputedVariable::{closure}  */  rev_user_text  FROM `revision`    WHERE rev_page = '1743794'   ORDER BY rev_timestamp DESC LIMIT 100\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: revision
         type: ref
possible_keys: page_timestamp,page_user_timestamp,rev_page_id
          key: page_user_timestamp
      key_len: 4
          ref: const
         rows: 137288
        Extra: Using where; Using filesort

That is _so strange_ same query, just some time later and now it filesorts?!!!

It is not the same query:

rev_page = '1743794'

vs

rev_page= '17437194'
root@db2062[enwiki]> SELECT count(*) FROM revision WHERE rev_page = '1743794';
+----------+
| count(*) |
+----------+
|    77861 |
+----------+
1 row in set (0.03 sec)

root@db2062[enwiki]> SELECT count(*) FROM revision WHERE rev_page= '17437194';
+----------+
| count(*) |
+----------+
|        2 |
+----------+
1 row in set (0.02 sec)

It makes sense, using only the first part of the index is considered more advantageous even if a filesort has to be done:

+----------------------------+-------+
| Variable_name              | Value |
+----------------------------+-------+
| Handler_commit             | 1     |
| Handler_delete             | 0     |
| Handler_discover           | 0     |
| Handler_external_lock      | 0     |
| Handler_icp_attempts       | 0     |
| Handler_icp_match          | 0     |
| Handler_mrr_init           | 0     |
| Handler_mrr_key_refills    | 0     |
| Handler_mrr_rowid_refills  | 0     |
| Handler_prepare            | 0     |
| Handler_read_first         | 0     |
| Handler_read_key           | 1     |
| Handler_read_last          | 0     |
| Handler_read_next          | 77861 |
| Handler_read_prev          | 0     |
| Handler_read_retry         | 0     |
| Handler_read_rnd           | 0     |
| Handler_read_rnd_deleted   | 0     |
| Handler_read_rnd_next      | 0     |
| Handler_rollback           | 0     |
| Handler_savepoint          | 0     |
| Handler_savepoint_rollback | 0     |
| Handler_tmp_update         | 0     |
| Handler_tmp_write          | 0     |
| Handler_update             | 0     |
| Handler_write              | 0     |
+----------------------------+-------+
26 rows in set (0.00 sec)

+----------------------------+-------+
| Variable_name              | Value |
+----------------------------+-------+
| Handler_commit             | 1     |
| Handler_delete             | 0     |
| Handler_discover           | 0     |
| Handler_external_lock      | 0     |
| Handler_icp_attempts       | 0     |
| Handler_icp_match          | 0     |
| Handler_mrr_init           | 0     |
| Handler_mrr_key_refills    | 0     |
| Handler_mrr_rowid_refills  | 0     |
| Handler_prepare            | 0     |
| Handler_read_first         | 0     |
| Handler_read_key           | 1     |
| Handler_read_last          | 0     |
| Handler_read_next          | 0     |
| Handler_read_prev          | 99    |
| Handler_read_retry         | 0     |
| Handler_read_rnd           | 0     |
| Handler_read_rnd_deleted   | 0     |
| Handler_read_rnd_next      | 0     |
| Handler_rollback           | 0     |
| Handler_savepoint          | 0     |
| Handler_savepoint_rollback | 0     |
| Handler_tmp_update         | 0     |
| Handler_tmp_write          | 0     |
| Handler_update             | 0     |
| Handler_write              | 0     |
+----------------------------+-------+
26 rows in set (0.00 sec)

The LIMIT starts doing that for LIMIT 23 -> LIMIT 24.

CC @Anomie because I think he was hoping 10.1 would solve this or very similar issues, but it doesn't seem to be the case. :-/

:(

Let's add a force- If someone tells me where that code is, I can add it.

Let's add a force- If someone tells me where that code is, I can add it.

Looks like it's here.

Thank you- seeing the code, I think there may be a change for a more elegant solution later- as the query is not really what it is wanted, it was already a compromise. I will just add the FORCE now and we can open later a separate ticket for alternative solutions.

:(

Don't worry, we will do more tests with 10.2 and Oracle's MySQL.

Change 350006 merged by jenkins-bot:
[mediawiki/extensions/AbuseFilter@master] Add FORCE INDEX to AFComputedVariable::{closure}

https://gerrit.wikimedia.org/r/350006

The fix was merged and is going out in this weeks branch. Should start applying to Wikipedia sometime tomorrow.

Krinkle closed this task as Resolved.Apr 28 2017, 3:08 AM
Krinkle claimed this task.

Looking at the last 7 days and 24 hours in log stash, this error is now gone as of 21:00 UTC today - when the patch rolled out to all wikis as part of 1.29.0-wmf.21.

https://logstash.wikimedia.org/goto/98a6a55853f5ce1976d239515a11f222

Krinkle moved this task from Inbox to Doing on the Performance-Team board.Apr 28 2017, 3:08 AM

Change 353015 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] Enble puppet on db2062- a one-time experiment for mariadb 10.1

https://gerrit.wikimedia.org/r/353015

Change 353015 merged by Jcrespo:
[operations/puppet@production] Enble puppet on db2062- a one-time experiment for mariadb 10.1

https://gerrit.wikimedia.org/r/353015

Change 354092 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/mediawiki-config@master] MariaDB: Repool db2062 after maintenanace

https://gerrit.wikimedia.org/r/354092

Change 354092 merged by jenkins-bot:
[operations/mediawiki-config@master] MariaDB: Repool db2062 after maintenanace

https://gerrit.wikimedia.org/r/354092

Mentioned in SAL (#wikimedia-operations) [2017-05-18T06:05:43Z] <marostegui@tin> Synchronized wmf-config/db-codfw.php: Repool db2062 - T116557 (duration: 00m 39s)