Page MenuHomePhabricator

Beta Cluster Special:Contributions lags by a long time and notes slow Flow queries
Closed, ResolvedPublic

Description

On http://en.wikipedia.beta.wmflabs.org/wiki/Special:Contributions do a search for "Selenium_user"

As of today the page takes minutes to load, and then displays "Due to high database server lag, changes newer than 81 seconds may not appear in this list."

The Flow team would like to be able to check Special:Contributions for recent changes made by browser tests but this is not currently possible given the time needed to load the page and the time needed for the lag to catch up.

Details

Related Gerrit Patches:

Event Timeline

Cmcmahon raised the priority of this task from to Needs Triage.
Cmcmahon updated the task description. (Show Details)
Cmcmahon changed Security from none to None.
hashar added subscribers: Springle, hashar.

Looking at https://logstash-beta.wmflabs.org/ hhvm reports a bunch of slow queries such as. The Selenium user having user_id 820.

SlowTimer [59999ms] at runtime/ext_mysql: slow query:
SELECT /* Flow\\Formatter\\ContributionsQuery::queryRevisions <hashar IP> */  *
FROM `flow_revision`,`flow_workflow`,`flow_tree_node`
WHERE
rev_user_id = '820'
AND rev_user_ip IS NULL
AND rev_user_wiki = 'enwiki'
AND (rev_id < '#005*\x8B*\xC0\\0\\0\\0\\0\\0\\0')
AND workflow_wiki = 'enwiki'
AND (tree_descendant_id = rev_type_id)
AND rev_type = 'post-summary'
ORDER BY rev_id DESC LIMIT 51
+------+-------------+----------------+------+--------------------------------------------------+-----------------------+---------+----------------------------------+-------+---------------------------------------------------------------------+
| id   | select_type | table          | type | possible_keys                                    | key                   | key_len | ref                              | rows  | Extra                                                               |
+------+-------------+----------------+------+--------------------------------------------------+-----------------------+---------+----------------------------------+-------+---------------------------------------------------------------------+
|    1 | SIMPLE      | flow_revision  | ref  | PRIMARY,flow_revision_user,flow_revision_type_id | flow_revision_type_id | 18      | const                            |  1709 | Using index condition; Using where; Using temporary; Using filesort |
|    1 | SIMPLE      | flow_tree_node | ref  | flow_tree_constraint                             | flow_tree_constraint  | 11      | enwiki.flow_revision.rev_type_id |     1 | Using index                                                         |
|    1 | SIMPLE      | flow_workflow  | ALL  | flow_workflow_lookup                             | NULL                  | NULL    | NULL                             | 40366 | Using where; Using join buffer (flat, BNL join)                     |
+------+-------------+----------------+------+--------------------------------------------------+-----------------------+---------+----------------------------------+-------+---------------------------------------------------------------------+

That one is slow at hell, uses a filesort etc..

Please sync with the Flow team and the DBA to figure out what is wrong in the database queries.

Beta cluster uses update.php to update the database schema, so maybe some index are missing.

greg triaged this task as Medium priority.Jan 6 2015, 12:00 AM
greg moved this task from To Triage to Backlog on the Beta-Cluster-Infrastructure board.
Spage renamed this task from Beta labs Special:Contributions lags by a long time to Beta labs Special:Contributions lags by a long time and notes slow Flow queries.Jan 8 2015, 9:46 PM
Spage added a subscriber: Performance Issue.
greg renamed this task from Beta labs Special:Contributions lags by a long time and notes slow Flow queries to Beta Cluster Special:Contributions lags by a long time and notes slow Flow queries.Oct 4 2015, 7:13 PM
Restricted Application added a subscriber: Luke081515. · View Herald TranscriptOct 4 2015, 7:13 PM

Updated explain result of that query:

+------+-------------+----------------+-------------+--------------------------------------------------+----------------------------+---------+----------------------------------+------+--------------------------------------------------------------------------+
| id   | select_type | table          | type        | possible_keys                                    | key                        | key_len | ref                              | rows | Extra                                                                    |
+------+-------------+----------------+-------------+--------------------------------------------------+----------------------------+---------+----------------------------------+------+--------------------------------------------------------------------------+
|    1 | SIMPLE      | flow_revision  | index_merge | PRIMARY,flow_revision_user,flow_revision_type_id | flow_revision_user,PRIMARY | 116,11  | NULL                             |    1 | Using intersect(flow_revision_user,PRIMARY); Using where; Using filesort |
|    1 | SIMPLE      | flow_tree_node | ref         | flow_tree_constraint                             | flow_tree_constraint       | 11      | flowdb.flow_revision.rev_type_id |    1 | Using index                                                              |
|    1 | SIMPLE      | flow_workflow  | ref         | flow_workflow_lookup                             | flow_workflow_lookup       | 66      | const                            |  365 | Using index condition                                                    |
+------+-------------+----------------+-------------+--------------------------------------------------+----------------------------+---------+----------------------------------+------+--------------------------------------------------------------------------+
3 rows in set (0.02 sec)

Another instance of this query took down Beta Cluster again today. Fortunately, it was executed on db2 (the replica) this time so the damage was minimal.

Here's a sanitized version of it that @ori posted in IRC, in case it's helpful. (Sorry for the mangled utf8 sequence; I'm not sure if that's mysql's fault or my IRC client's.)

SELECT /* Flow\Formatter\ContributionsQuery::queryRevisions X.X.X.X */  *  FROM `flow_revision`,`flow_workflow`,`flow_tree_node`   WHERE rev_user_id = '820' AND rev_user_ip IS NULL AND rev_user_wiki = 'enwiki' AND (rev_id < 'G��p\0\0\0\0\0\0') AND workflow_wiki = 'enwiki' AND (tree_descendant_id = rev_type_id) AND rev_type = 'post-summary'  ORDER BY rev_id DESC LIMIT 51
Catrope claimed this task.Nov 2 2015, 6:40 PM
Catrope raised the priority of this task from Medium to High.

I analyzed all possible query types for ContributionsQuery over at T116447#1755973 . It looks like indexing for the joins and the joined tables is fine, so I'll focus on flow_revision. The flow_revision part of the most basic type of query looks like:

SELECT * FROM flow_revision
WHERE rev_user_id = 123
AND rev_user_ip IS NULL
AND rev_user_wiki = 'enwiki'
AND rev_type = 'post-summary'
ORDER BY rev_id DESC
LIMIT 51;

Variants include:

  • For anonymous users, rev_user_id = 123 AND rev_user_ip IS NULL is replaced with rev_user_id IS NULL AND rev_user_ip = '1.2.3.4'
  • For the "newbies" query, there's a terrible condition that looks like rev_user_id > 123 AND rev_user_id NOT IN (456, 789)
  • For continuations, we add AND rev_id < 'ABC' (or rev_id > 'ABC' if we are using ASC instead of DESC)
  • In one case, we don't have a condition on rev_type at all, even though the joins we perform end up dropping all rows that don't have rev_type='post'; so we might as well add AND rev_type='post' in that case

The anon variant doesn't change anything, because we have rev_user_id=const AND rev_user_ip=const in both cases.
The newbies variant looks scary, but it doesn't look much worse than the equivalent query in core, and it doesn't appear to be the variant that's causing problems here, so I'm ignoring it for now.

Somewhat surprisingly, the continuation variant is the problem. We have an index on (rev_user_id, rev_user_ip, rev_user_wiki), and AFAIK MySQL adds the primary key ((rev_id)) at the end of every index. This satisfies WHERE rev_user_id=X AND rev_user_ip=Y AND rev_user_wiki=Z ORDER BY rev_id just fine, but if you add AND rev_id < 'X' it wants to do an index merge (on the user+wiki index and the primary key) and a filesort. Adding FORCE INDEX (flow_revision_user) appears to fix that, so I'll put in a patch that does that. This behavior is strange, though, and possibly a bug in the optimizer.

Longer term, though, I think we should probably add rev_type to the index (and always assert it, per the above), and look into improving the newbie contributions queries, both in core and in Flow.

Another instance of this query took down Beta Cluster again today. Fortunately, it was executed on db2 (the replica) this time so the damage was minimal.
Here's a sanitized version of it that @ori posted in IRC, in case it's helpful. (Sorry for the mangled utf8 sequence; I'm not sure if that's mysql's fault or my IRC client's.)

SELECT /* Flow\Formatter\ContributionsQuery::queryRevisions X.X.X.X */  *  FROM `flow_revision`,`flow_workflow`,`flow_tree_node`   WHERE rev_user_id = '820' AND rev_user_ip IS NULL AND rev_user_wiki = 'enwiki' AND (rev_id < 'G��p\0\0\0\0\0\0') AND workflow_wiki = 'enwiki' AND (tree_descendant_id = rev_type_id) AND rev_type = 'post-summary'  ORDER BY rev_id DESC LIMIT 51

That query was missing a condition, which I dismissed as a mistake on your part, until I noticed that we have a very silly bug that causes one of the conditions to be dropped in the continuation case. Patch incoming.

This comment was removed by Catrope.

Change 252612 had a related patch set uploaded (by Catrope):
Don't use array plus to merge SQL condition arrays

https://gerrit.wikimedia.org/r/252612

Change 252614 had a related patch set uploaded (by Catrope):
Add FORCE INDEX to contributions queries to work around optimizer bug

https://gerrit.wikimedia.org/r/252614

Change 252612 merged by jenkins-bot:
Don't use array plus to merge SQL condition arrays

https://gerrit.wikimedia.org/r/252612

This behavior is strange

No it is not, you cannot have a range and order with the same index, you have to choose which index to use. That is a property of BTREE indexes. By adding the join you have technically 2 ranges there.

Both these 2 lines are bad pieces of code and should be eliminated ASAP:

array( 'flow_revision', 'flow_workflow', 'flow_tree_node' ),
array( '*' ),

(and direct cause of the previous bug). How to do a proper JOIN: https://www.mediawiki.org/wiki/Manual:Database_access#Wrapper_function:_select.28.29

I am not ok with the FORCE unless it has been proven as necessary (we may need to slightly modify the indexes). They historically and currently are causing issues in production, mainly because it solves this particular combination of parameters now, but it breaks other condition or brakes it after some time passes and the cardinality changes.

Let's reevaluate with the new query.

I agree it should be a proper JOIN, and I'll work on that. However, I don't think that's related, because if I reduce this to just the query on flow_revision (which is the common part that the three queries share), I get this behavior:

mysql:research@x1-analytics-slave [flowdb]> explain select * from flow_revision where rev_user_id=123 and rev_user_ip is null and rev_user_wiki='enwiki' order by rev_id desc limit 51;
+------+-------------+---------------+------+--------------------+--------------------+---------+-------------------+------+-------------+
| id   | select_type | table         | type | possible_keys      | key                | key_len | ref               | rows | Extra       |
+------+-------------+---------------+------+--------------------+--------------------+---------+-------------------+------+-------------+
|    1 | SIMPLE      | flow_revision | ref  | flow_revision_user | flow_revision_user | 116     | const,const,const |    1 | Using where |
+------+-------------+---------------+------+--------------------+--------------------+---------+-------------------+------+-------------+
1 row in set (0.00 sec)

mysql:research@x1-analytics-slave [flowdb]> explain select * from flow_revision where rev_user_id=123 and rev_user_ip is null and rev_user_wiki='enwiki' and rev_id < 'W' order by rev_id desc limit 51;
+------+-------------+---------------+-------------+----------------------------+----------------------------+---------+------+------+--------------------------------------------------------------------------+
| id   | select_type | table         | type        | possible_keys              | key                        | key_len | ref  | rows | Extra                                                                    |
+------+-------------+---------------+-------------+----------------------------+----------------------------+---------+------+------+--------------------------------------------------------------------------+
|    1 | SIMPLE      | flow_revision | index_merge | PRIMARY,flow_revision_user | flow_revision_user,PRIMARY | 116,11  | NULL |    1 | Using intersect(flow_revision_user,PRIMARY); Using where; Using filesort |
+------+-------------+---------------+-------------+----------------------------+----------------------------+---------+------+------+--------------------------------------------------------------------------+
1 row in set (0.00 sec)

mysql:research@x1-analytics-slave [flowdb]> explain select * from flow_revision force index(flow_revision_user) where rev_user_id=123 and rev_user_ip is null and rev_user_wiki='enwiki' and rev_id < 'W' order by rev_id desc limit 51;
+------+-------------+---------------+------+--------------------+--------------------+---------+-------------------+------+-------------+
| id   | select_type | table         | type | possible_keys      | key                | key_len | ref               | rows | Extra       |
+------+-------------+---------------+------+--------------------+--------------------+---------+-------------------+------+-------------+
|    1 | SIMPLE      | flow_revision | ref  | flow_revision_user | flow_revision_user | 116     | const,const,const |    1 | Using where |
+------+-------------+---------------+------+--------------------+--------------------+---------+-------------------+------+-------------+
1 row in set (0.00 sec)

This behavior leads me to believe that we should use FORCE INDEX for all three query types, including the ones that already have a JOIN.

I tested it with the full queries just to check. Note that the join variant and the non-join variant of the query you mentioned have the same explain output.

Variant 1:

mysql:research@x1-analytics-slave [flowdb]> explain select * from flow_revision join flow_tree_revision on tree_rev_id=rev_id join flow_tree_node on tree_descendant_id = tree_rev_descendant_id join flow_workflow on workflow_id=tree_ancestor_id where rev_user_wiki='enwiki' and workflow_wiki='enwiki' and workflow_namespace=1 and rev_user_id=123 and rev_user_ip is null order by rev_id desc limit 51;
+------+-------------+--------------------+--------+----------------------------------------+----------------------+---------+--------------------------------------------------+------+-------------+
| id   | select_type | table              | type   | possible_keys                          | key                  | key_len | ref                                              | rows | Extra       |
+------+-------------+--------------------+--------+----------------------------------------+----------------------+---------+--------------------------------------------------+------+-------------+
|    1 | SIMPLE      | flow_revision      | ref    | PRIMARY,flow_revision_user             | flow_revision_user   | 116     | const,const,const                                |    1 | Using where |
|    1 | SIMPLE      | flow_tree_revision | eq_ref | PRIMARY,flow_tree_descendant_rev_id    | PRIMARY              | 11      | flowdb.flow_revision.rev_id                      |    1 |             |
|    1 | SIMPLE      | flow_tree_node     | ref    | flow_tree_node_pk,flow_tree_constraint | flow_tree_constraint | 11      | flowdb.flow_tree_revision.tree_rev_descendant_id |    1 | Using index |
|    1 | SIMPLE      | flow_workflow      | eq_ref | PRIMARY,flow_workflow_lookup           | PRIMARY              | 11      | flowdb.flow_tree_node.tree_ancestor_id           |    1 | Using where |
+------+-------------+--------------------+--------+----------------------------------------+----------------------+---------+--------------------------------------------------+------+-------------+
4 rows in set (0.00 sec)

mysql:research@x1-analytics-slave [flowdb]> explain select * from flow_revision join flow_tree_revision on tree_rev_id=rev_id join flow_tree_node on tree_descendant_id = tree_rev_descendant_id join flow_workflow on workflow_id=tree_ancestor_id where rev_user_wiki='enwiki' and workflow_wiki='enwiki' and workflow_namespace=1 and rev_user_id=123 and rev_user_ip is null and rev_id < 'W' order by rev_id desc limit 51;
+------+-------------+--------------------+-------------+----------------------------------------+----------------------------+---------+--------------------------------------------------+------+--------------------------------------------------------------------------+
| id   | select_type | table              | type        | possible_keys                          | key                        | key_len | ref                                              | rows | Extra                                                                    |
+------+-------------+--------------------+-------------+----------------------------------------+----------------------------+---------+--------------------------------------------------+------+--------------------------------------------------------------------------+
|    1 | SIMPLE      | flow_revision      | index_merge | PRIMARY,flow_revision_user             | flow_revision_user,PRIMARY | 116,11  | NULL                                             |    1 | Using intersect(flow_revision_user,PRIMARY); Using where; Using filesort |
|    1 | SIMPLE      | flow_tree_revision | eq_ref      | PRIMARY,flow_tree_descendant_rev_id    | PRIMARY                    | 11      | flowdb.flow_revision.rev_id                      |    1 |                                                                          |
|    1 | SIMPLE      | flow_tree_node     | ref         | flow_tree_node_pk,flow_tree_constraint | flow_tree_constraint       | 11      | flowdb.flow_tree_revision.tree_rev_descendant_id |    1 | Using index                                                              |
|    1 | SIMPLE      | flow_workflow      | eq_ref      | PRIMARY,flow_workflow_lookup           | PRIMARY                    | 11      | flowdb.flow_tree_node.tree_ancestor_id           |    1 | Using where                                                              |
+------+-------------+--------------------+-------------+----------------------------------------+----------------------------+---------+--------------------------------------------------+------+--------------------------------------------------------------------------+
4 rows in set (0.01 sec)

mysql:research@x1-analytics-slave [flowdb]> explain select * from flow_revision force index(flow_revision_user) join flow_tree_revision on tree_rev_id=rev_id join flow_tree_node on tree_descendant_id = tree_rev_descendant_id join flow_workflow on workflow_id=tree_ancestor_id where rev_user_wiki='enwiki' and workflow_wiki='enwiki' and workflow_namespace=1 and rev_user_id=123 and rev_user_ip is null and rev_id < 'W' order by rev_id desc limit 51;
+------+-------------+--------------------+--------+----------------------------------------+----------------------+---------+--------------------------------------------------+------+-------------+
| id   | select_type | table              | type   | possible_keys                          | key                  | key_len | ref                                              | rows | Extra       |
+------+-------------+--------------------+--------+----------------------------------------+----------------------+---------+--------------------------------------------------+------+-------------+
|    1 | SIMPLE      | flow_revision      | ref    | flow_revision_user                     | flow_revision_user   | 116     | const,const,const                                |    1 | Using where |
|    1 | SIMPLE      | flow_tree_revision | eq_ref | PRIMARY,flow_tree_descendant_rev_id    | PRIMARY              | 11      | flowdb.flow_revision.rev_id                      |    1 |             |
|    1 | SIMPLE      | flow_tree_node     | ref    | flow_tree_node_pk,flow_tree_constraint | flow_tree_constraint | 11      | flowdb.flow_tree_revision.tree_rev_descendant_id |    1 | Using index |
|    1 | SIMPLE      | flow_workflow      | eq_ref | PRIMARY,flow_workflow_lookup           | PRIMARY              | 11      | flowdb.flow_tree_node.tree_ancestor_id           |    1 | Using where |
+------+-------------+--------------------+--------+----------------------------------------+----------------------+---------+--------------------------------------------------+------+-------------+
4 rows in set (0.00 sec)

Variant 2:

mysql:research@x1-analytics-slave [flowdb]> explain select * from flow_revision join flow_workflow on workflow_id=rev_type_id and rev_type='header' where rev_user_wiki='enwiki' and workflow_wiki='enwiki' and workflow_namespace=1 and rev_user_id=123 and rev_user_ip is null order by rev_id desc limit 51;
+------+-------------+---------------+--------+------------------------------------------+--------------------+---------+----------------------------------+------+-------------+
| id   | select_type | table         | type   | possible_keys                            | key                | key_len | ref                              | rows | Extra       |
+------+-------------+---------------+--------+------------------------------------------+--------------------+---------+----------------------------------+------+-------------+
|    1 | SIMPLE      | flow_revision | ref    | flow_revision_user,flow_revision_type_id | flow_revision_user | 116     | const,const,const                |    1 | Using where |
|    1 | SIMPLE      | flow_workflow | eq_ref | PRIMARY,flow_workflow_lookup             | PRIMARY            | 11      | flowdb.flow_revision.rev_type_id |    1 | Using where |
+------+-------------+---------------+--------+------------------------------------------+--------------------+---------+----------------------------------+------+-------------+
2 rows in set (0.00 sec)

mysql:research@x1-analytics-slave [flowdb]> explain select * from flow_revision join flow_workflow on workflow_id=rev_type_id and rev_type='header' where rev_user_wiki='enwiki' and workflow_wiki='enwiki' and workflow_namespace=1 and rev_user_id=123 and rev_user_ip is null and rev_id < 'W' order by rev_id desc limit 51;
+------+-------------+---------------+-------------+--------------------------------------------------+----------------------------+---------+----------------------------------+------+--------------------------------------------------------------------------+
| id   | select_type | table         | type        | possible_keys                                    | key                        | key_len | ref                              | rows | Extra                                                                    |
+------+-------------+---------------+-------------+--------------------------------------------------+----------------------------+---------+----------------------------------+------+--------------------------------------------------------------------------+
|    1 | SIMPLE      | flow_revision | index_merge | PRIMARY,flow_revision_user,flow_revision_type_id | flow_revision_user,PRIMARY | 116,11  | NULL                             |    1 | Using intersect(flow_revision_user,PRIMARY); Using where; Using filesort |
|    1 | SIMPLE      | flow_workflow | eq_ref      | PRIMARY,flow_workflow_lookup                     | PRIMARY                    | 11      | flowdb.flow_revision.rev_type_id |    1 | Using where                                                              |
+------+-------------+---------------+-------------+--------------------------------------------------+----------------------------+---------+----------------------------------+------+--------------------------------------------------------------------------+
2 rows in set (0.00 sec)

mysql:research@x1-analytics-slave [flowdb]> explain select * from flow_revision force index(flow_revision_user) join flow_workflow on workflow_id=rev_type_id and rev_type='header' where rev_user_wiki='enwiki' and workflow_wiki='enwiki' and workflow_namespace=1 and rev_user_id=123 and rev_user_ip is null and rev_id < 'W' order by rev_id desc limit 51;
+------+-------------+---------------+--------+------------------------------+--------------------+---------+----------------------------------+------+-------------+
| id   | select_type | table         | type   | possible_keys                | key                | key_len | ref                              | rows | Extra       |
+------+-------------+---------------+--------+------------------------------+--------------------+---------+----------------------------------+------+-------------+
|    1 | SIMPLE      | flow_revision | ref    | flow_revision_user           | flow_revision_user | 116     | const,const,const                |    1 | Using where |
|    1 | SIMPLE      | flow_workflow | eq_ref | PRIMARY,flow_workflow_lookup | PRIMARY            | 11      | flowdb.flow_revision.rev_type_id |    1 | Using where |
+------+-------------+---------------+--------+------------------------------+--------------------+---------+----------------------------------+------+-------------+
2 rows in set (0.00 sec)

Variant 3 (as currently performed, without joins):

mysql:research@x1-analytics-slave [flowdb]> explain select * from flow_revision, flow_workflow, flow_tree_node where workflow_id=tree_ancestor_id and tree_descendant_id = rev_type_id and rev_type='post-summary' and rev_user_wiki = 'enwiki' and workflow_wiki='enwiki' and workflow_namespace=1 and rev_user_id=123 and rev_user_ip is null order by rev_id desc limit 51;
+------+-------------+----------------+--------+------------------------------------------+----------------------+---------+----------------------------------------+------+-------------+
| id   | select_type | table          | type   | possible_keys                            | key                  | key_len | ref                                    | rows | Extra       |
+------+-------------+----------------+--------+------------------------------------------+----------------------+---------+----------------------------------------+------+-------------+
|    1 | SIMPLE      | flow_revision  | ref    | flow_revision_user,flow_revision_type_id | flow_revision_user   | 116     | const,const,const                      |    1 | Using where |
|    1 | SIMPLE      | flow_tree_node | ref    | flow_tree_node_pk,flow_tree_constraint   | flow_tree_constraint | 11      | flowdb.flow_revision.rev_type_id       |    1 | Using index |
|    1 | SIMPLE      | flow_workflow  | eq_ref | PRIMARY,flow_workflow_lookup             | PRIMARY              | 11      | flowdb.flow_tree_node.tree_ancestor_id |    1 | Using where |
+------+-------------+----------------+--------+------------------------------------------+----------------------+---------+----------------------------------------+------+-------------+
3 rows in set (0.01 sec)

mysql:research@x1-analytics-slave [flowdb]> explain select * from flow_revision, flow_workflow, flow_tree_node where workflow_id=tree_ancestor_id and tree_descendant_id = rev_type_id and rev_type='post-summary' and rev_user_wiki = 'enwiki' and workflow_wiki='enwiki' and workflow_namespace=1 and rev_user_id=123 and rev_user_ip is null and rev_id < 'W' order by rev_id desc limit 51;
+------+-------------+----------------+-------------+--------------------------------------------------+----------------------------+---------+----------------------------------------+------+--------------------------------------------------------------------------+
| id   | select_type | table          | type        | possible_keys                                    | key                        | key_len | ref                                    | rows | Extra                                                                    |
+------+-------------+----------------+-------------+--------------------------------------------------+----------------------------+---------+----------------------------------------+------+--------------------------------------------------------------------------+
|    1 | SIMPLE      | flow_revision  | index_merge | PRIMARY,flow_revision_user,flow_revision_type_id | flow_revision_user,PRIMARY | 116,11  | NULL                                   |    1 | Using intersect(flow_revision_user,PRIMARY); Using where; Using filesort |
|    1 | SIMPLE      | flow_tree_node | ref         | flow_tree_node_pk,flow_tree_constraint           | flow_tree_constraint       | 11      | flowdb.flow_revision.rev_type_id       |    1 | Using index                                                              |
|    1 | SIMPLE      | flow_workflow  | eq_ref      | PRIMARY,flow_workflow_lookup                     | PRIMARY                    | 11      | flowdb.flow_tree_node.tree_ancestor_id |    1 | Using where                                                              |
+------+-------------+----------------+-------------+--------------------------------------------------+----------------------------+---------+----------------------------------------+------+--------------------------------------------------------------------------+
3 rows in set (0.00 sec)

mysql:research@x1-analytics-slave [flowdb]> explain select * from flow_revision force index(flow_revision_user), flow_workflow, flow_tree_node where workflow_id=tree_ancestor_id and tree_descendant_id = rev_type_id and rev_type='post-summary' and rev_user_wiki = 'enwiki' and workflow_wiki='enwiki' and workflow_namespace=1 and rev_user_id=123 and rev_user_ip is null and rev_id < 'W' order by rev_id desc limit 51;
+------+-------------+----------------+--------+----------------------------------------+----------------------+---------+----------------------------------------+------+-------------+
| id   | select_type | table          | type   | possible_keys                          | key                  | key_len | ref                                    | rows | Extra       |
+------+-------------+----------------+--------+----------------------------------------+----------------------+---------+----------------------------------------+------+-------------+
|    1 | SIMPLE      | flow_revision  | ref    | flow_revision_user                     | flow_revision_user   | 116     | const,const,const                      |    1 | Using where |
|    1 | SIMPLE      | flow_tree_node | ref    | flow_tree_node_pk,flow_tree_constraint | flow_tree_constraint | 11      | flowdb.flow_revision.rev_type_id       |    1 | Using index |
|    1 | SIMPLE      | flow_workflow  | eq_ref | PRIMARY,flow_workflow_lookup           | PRIMARY              | 11      | flowdb.flow_tree_node.tree_ancestor_id |    1 | Using where |
+------+-------------+----------------+--------+----------------------------------------+----------------------+---------+----------------------------------------+------+-------------+
3 rows in set (0.01 sec)

Variant 3 converted to joins:

mysql:research@x1-analytics-slave [flowdb]> explain select * from flow_revision join flow_tree_node on tree_descendant_id=rev_type_id and rev_type='header' join flow_workflow on workflow_id=tree_ancestor_id where rev_user_wiki='enwiki' and workflow_wiki='enwiki' and workflow_namespace=1 and rev_user_id=123 and rev_user_ip is null order by rev_id desc limit 51;
+------+-------------+----------------+--------+------------------------------------------+----------------------+---------+----------------------------------------+------+-------------+
| id   | select_type | table          | type   | possible_keys                            | key                  | key_len | ref                                    | rows | Extra       |
+------+-------------+----------------+--------+------------------------------------------+----------------------+---------+----------------------------------------+------+-------------+
|    1 | SIMPLE      | flow_revision  | ref    | flow_revision_user,flow_revision_type_id | flow_revision_user   | 116     | const,const,const                      |    1 | Using where |
|    1 | SIMPLE      | flow_tree_node | ref    | flow_tree_node_pk,flow_tree_constraint   | flow_tree_constraint | 11      | flowdb.flow_revision.rev_type_id       |    1 | Using index |
|    1 | SIMPLE      | flow_workflow  | eq_ref | PRIMARY,flow_workflow_lookup             | PRIMARY              | 11      | flowdb.flow_tree_node.tree_ancestor_id |    1 | Using where |
+------+-------------+----------------+--------+------------------------------------------+----------------------+---------+----------------------------------------+------+-------------+
3 rows in set (0.01 sec)

mysql:research@x1-analytics-slave [flowdb]> explain select * from flow_revision join flow_tree_node on tree_descendant_id=rev_type_id and rev_type='header' join flow_workflow on workflow_id=tree_ancestor_id where rev_user_wiki='enwiki' and workflow_wiki='enwiki' and workflow_namespace=1 and rev_user_id=123 and rev_user_ip is null and rev_id < 'W' order by rev_id desc limit 51;
+------+-------------+----------------+-------------+--------------------------------------------------+----------------------------+---------+----------------------------------------+------+--------------------------------------------------------------------------+
| id   | select_type | table          | type        | possible_keys                                    | key                        | key_len | ref                                    | rows | Extra                                                                    |
+------+-------------+----------------+-------------+--------------------------------------------------+----------------------------+---------+----------------------------------------+------+--------------------------------------------------------------------------+
|    1 | SIMPLE      | flow_revision  | index_merge | PRIMARY,flow_revision_user,flow_revision_type_id | flow_revision_user,PRIMARY | 116,11  | NULL                                   |    1 | Using intersect(flow_revision_user,PRIMARY); Using where; Using filesort |
|    1 | SIMPLE      | flow_tree_node | ref         | flow_tree_node_pk,flow_tree_constraint           | flow_tree_constraint       | 11      | flowdb.flow_revision.rev_type_id       |    1 | Using index                                                              |
|    1 | SIMPLE      | flow_workflow  | eq_ref      | PRIMARY,flow_workflow_lookup                     | PRIMARY                    | 11      | flowdb.flow_tree_node.tree_ancestor_id |    1 | Using where                                                              |
+------+-------------+----------------+-------------+--------------------------------------------------+----------------------------+---------+----------------------------------------+------+--------------------------------------------------------------------------+
3 rows in set (0.00 sec)

mysql:research@x1-analytics-slave [flowdb]> explain select * from flow_revision force index(flow_revision_user) join flow_tree_node on tree_descendant_id=rev_type_id and rev_type='header' join flow_workflow on workflow_id=tree_ancestor_id where rev_user_wiki='enwiki' and workflow_wiki='enwiki' and workflow_namespace=1 and rev_user_id=123 and rev_user_ip is null and rev_id < 'W' order by rev_id desc limit 51;
+------+-------------+----------------+--------+----------------------------------------+----------------------+---------+----------------------------------------+------+-------------+
| id   | select_type | table          | type   | possible_keys                          | key                  | key_len | ref                                    | rows | Extra       |
+------+-------------+----------------+--------+----------------------------------------+----------------------+---------+----------------------------------------+------+-------------+
|    1 | SIMPLE      | flow_revision  | ref    | flow_revision_user                     | flow_revision_user   | 116     | const,const,const                      |    1 | Using where |
|    1 | SIMPLE      | flow_tree_node | ref    | flow_tree_node_pk,flow_tree_constraint | flow_tree_constraint | 11      | flowdb.flow_revision.rev_type_id       |    1 | Using index |
|    1 | SIMPLE      | flow_workflow  | eq_ref | PRIMARY,flow_workflow_lookup           | PRIMARY              | 11      | flowdb.flow_tree_node.tree_ancestor_id |    1 | Using where |
+------+-------------+----------------+--------+----------------------------------------+----------------------+---------+----------------------------------------+------+-------------+
3 rows in set (0.00 sec)

Change 252769 had a related patch set uploaded (by Catrope):
Convert the post-summary query in ContributionsQuery to use JOINs

https://gerrit.wikimedia.org/r/252769

Change 252769 merged by jenkins-bot:
Convert the post-summary query in ContributionsQuery to use JOINs

https://gerrit.wikimedia.org/r/252769

Is this still an issue? If it's blocked, what's it blocked on and can it be marked as a blocker? It's been bumping along as a "blocked" task for 15 months now since the last activity…

hashar removed a subscriber: hashar.Mar 9 2017, 8:11 AM
Catrope closed this task as Resolved.Apr 18 2017, 5:45 PM

This seems fixed to me. Please reopen if slow query log entries reappear.

Change 252614 abandoned by Catrope:
Add FORCE INDEX to contributions queries to work around optimizer bug

https://gerrit.wikimedia.org/r/252614