Page MenuHomePhabricator

Do backfill and final schema change for "Segregate Reference objects by source wiki."
Closed, ResolvedPublic

Description

We need to do the following (see comments for details of earlier work):

We'll need DBA assistance for 'Remove the old indexes'. The Collaboration team will do everything else before that.

This needs to be done in flowdb and all private wikis with Flow (officewiki).

Related Objects

View Standalone Graph
This task is connected to more than 200 other tasks. Only direct parents and subtasks are shown here. Use View Standalone Graph to show more of the graph.

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

I applied the change on officewiki, but I had to revert the change on the main flow database:

The proposed query performs almost a full table scan per execution, and that means the table fully blocked for a *very long time*, probably a full table scan per row update. Not only the updates should be conservative to avoid lag, the query itself should be fast because we are using STATEMENT-BASED binlog:

MariaDB [flowdb]> EXPLAIN SELECT ref_src_object_id, workflow_wiki
    ->            FROM /*_*/flow_wiki_ref, /*_*/flow_workflow
    ->        WHERE
    ->            flow_wiki_ref.ref_src_object_id = flow_workflow.workflow_id AND
    ->            flow_wiki_ref.ref_src_object_type IN ('header', 'post-summary') AND
    ->            ( flow_wiki_ref.ref_src_wiki = '' OR flow_wiki_ref.ref_src_wiki IS NULL )
    ->        LIMIT 1000\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: flow_wiki_ref
         type: ref_or_null
possible_keys: flow_wiki_ref_idx_v2,flow_wiki_ref_revision_v2
          key: flow_wiki_ref_idx_v2
      key_len: 19
          ref: const
         rows: 20403
        Extra: Using where; Using index
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: flow_workflow
         type: eq_ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 11
          ref: flowdb.flow_wiki_ref.ref_src_object_id
         rows: 1
        Extra: 
2 rows in set (0.00 sec)

MariaDB [flowdb]> SHOW GLOBAL VARIABLES like 'binlog_format';
+---------------+-----------+
| Variable_name | Value     |
+---------------+-----------+
| binlog_format | STATEMENT |
+---------------+-----------+
1 row in set (0.00 sec)

I can rewrite the query, but of course, any help rewriting the UPDATEs will help me do it faster. :-)
Rejecting this run until this is resolved. If someone wants a testbed, I can provide a "fake" table, copy of the real one to test it beforehand (It probably went very fast on testing and officewiki because they are way smaller).

I am open to getting contacted to see which way we could move forward with this.

To make this clear, options I see:

  • execute this independently on the master and on the slaves in 1 go, wish that the change does not affect many users while ongoing (the table could be blocked for writes for ~1 minute)
  • Try to rewrite the query to make it execute in less than 1 second for each run, run it in batches as originally intended

I do not thing switching to row based replication could be an option here that could help.

I wonder if ref_src_wiki = '' OR ref_src_wiki IS NULL is the culprit here. Perhaps the query would run faster if it was split into two queries, one with only the = '' condition and one with only the IS NULL condition? The EXPLAIN output for the = '' version looks better:

mysql:research@x1-analytics-slave [flowdb]> explain SELECT ref_src_object_id, workflow_wiki FROM  flow_wiki_ref,  flow_workflow        WHERE flow_wiki_ref.ref_src_object_id = flow_workflow.workflow_id AND flow_wiki_ref.ref_src_object_type IN ('header', 'post-summary') AND ( flow_wiki_ref.ref_src_wiki = '' )  LIMIT 1000 \G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: flow_wiki_ref
         type: ref
possible_keys: flow_wiki_ref_idx_v2,flow_wiki_ref_revision_v2
          key: flow_wiki_ref_idx_v2
      key_len: 19
          ref: const
         rows: 1
        Extra: Using where; Using index
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: flow_workflow
         type: eq_ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 11
          ref: flowdb.flow_wiki_ref.ref_src_object_id
         rows: 1
        Extra: 
2 rows in set (0.00 sec)

Whereas the one for the IS NULL version looks the same as what you posted for the full query.

In principle, it should not be hard to find all rows with a given ref_src_wiki value, given that there are multiple indices that start with that field. But maybe NULL is special? There's also the WHERE on ref_src_object_type, which is only there because we need different logic for different object types. Matt said there might be a different column that we could use that would let us use the same query for all types, eliminating ref_src_object_type from the query altogether; he said he would look at the values in that column today to see if that strategy would work.

Alternatively, we could see if we can write the case differentiation for the different types into the query, so we can run one query that processes all rows in index order rather than needing two queries that each skip a bunch of stuff using WHERE.

I'm going to rewrite it to use ref_src_workflow_id. This will drop the use of ref_src_object_type, as well as flow_tree_node.

It should be equivalent:

mysql:research@x1-analytics-slave [flowdb]> SELECT HEX(ref_src_object_id), HEX(ref_src_workflow_id) FROM flow_wiki_ref WHERE ref_src_object_type IN ('header', 'post-summary') AND ref_src_object_id != ref_src_workflow_id LIMIT 1;
Empty set (0.05 sec)

mysql:research@x1-analytics-slave [flowdb]> SELECT HEX(tree_ancestor_id), HEX(ref_src_workflow_id) FROM flow_wiki_ref, flow_tree_node, flow_workflow WHERE flow_wiki_ref.ref_src_object_id = flow_tree_node.tree_descendant_id AND flow_tree_node.tree_ancestor_id = flow_workflow.workflow_id AND flow_wiki_ref.ref_src_object_type IN ('post') AND tree_ancestor_id != ref_src_workflow_id LIMIT 1;
Empty set (0.42 sec)

mysql:research@x1-analytics-slave [flowdb]> SELECT HEX(ref_src_object_id), HEX(ref_src_workflow_id) FROM flow_ext_ref WHERE ref_src_object_type IN ('header', 'post-summary') AND ref_src_object_id != ref_src_workflow_id LIMIT 1;
Empty set (0.02 sec)

mysql:research@x1-analytics-slave [flowdb]> SELECT HEX(tree_ancestor_id), HEX(ref_src_workflow_id) FROM flow_ext_ref, flow_tree_node, flow_workflow WHERE flow_ext_ref.ref_src_object_id = flow_tree_node.tree_descendant_id AND flow_tree_node.tree_ancestor_id = flow_workflow.workflow_id AND flow_ext_ref.ref_src_object_type IN ('post') AND tree_ancestor_id != ref_src_workflow_id LIMIT 1;
Empty set (0.28 sec)

Change 237453 had a related patch set uploaded (by Mattflaschen):
WIP: use ref_src_workflow_id to find workflow ID directly

https://gerrit.wikimedia.org/r/237453

I think matt already fixed, but I wanted to put the findings here:

MariaDB [flowdb]> ALTER TABLE flow_wiki_ref_deleteme add index(ref_src_workflow_id, ref_src_wiki);
Query OK, 0 rows affected (0.16 sec)
Records: 0  Duplicates: 0  Warnings: 0

MariaDB [flowdb]> UPDATE flow_wiki_ref_deleteme, ( SELECT ref_src_workflow_id, workflow_wiki FROM flow_wiki_ref_deleteme,  flow_workflow WHERE flow_wiki_ref_deleteme.ref_src_workflow_id = flow_workflow.workflow_id AND ( flow_wiki_ref_deleteme.ref_src_wiki = '' OR flow_wiki_ref_deleteme.ref_src_wiki IS NULL ) LIMIT 1000 ) tmp SET flow_wiki_ref_deleteme.ref_src_wiki = tmp.workflow_wiki WHERE flow_wiki_ref_deleteme.ref_src_workflow_id = tmp.ref_src_workflow_id;
Query OK, 1001 rows affected, 1 warning (0.22 sec)
Rows matched: 1014  Changed: 1001  Warnings: 0

The index makes the update faster. So we will go with this solution on Monday.

Scheduled for Monday at 09:00 UTC. @jcrespo will perform the change with @matthiasmullie and myself standing by.

Change 237453 merged by jenkins-bot:
Use ref_src_workflow_id to find workflow ID, add index

https://gerrit.wikimedia.org/r/237453

Three rows were not backfilled. They appear to point to a workflow_id that can't be found anywhere.

mysql:research@x1-analytics-slave [flowdb]> select HEX(ref_src_object_id), ref_src_object_type, HEX(ref_src_workflow_id), ref_src_namespace, ref_src_title, ref_target_namespace, ref_target_title, ref_type, ref_src_wiki from flow_wiki_ref where ref_src_wiki is null or ref_src_wiki = '' \G
*************************** 1. row ***************************
  HEX(ref_src_object_id): 04CEA7D3417C3244302E8B
     ref_src_object_type: post
HEX(ref_src_workflow_id): 04CE8E3D46482F5A00E943
       ref_src_namespace: 2600
           ref_src_title: Qaakkkf3x55yrrcz
    ref_target_namespace: 4
        ref_target_title: Requests/Permissions
                ref_type: link
            ref_src_wiki: NULL
*************************** 2. row ***************************
  HEX(ref_src_object_id): 04CE8E3D426C3155FAA665
     ref_src_object_type: post
HEX(ref_src_workflow_id): 04CE8E3D46482F5A00E943
       ref_src_namespace: 2600
           ref_src_title: Qaakkkf3x55yrrcz
    ref_target_namespace: 4
        ref_target_title: Requests/Permissions/Erdemaslancan
                ref_type: link
            ref_src_wiki: NULL
*************************** 3. row ***************************
  HEX(ref_src_object_id): 04CED1F7DE7C340B2588B7
     ref_src_object_type: post
HEX(ref_src_workflow_id): 04CE8E3D46482F5A00E943
       ref_src_namespace: 2600
           ref_src_title: Qaakkkf3x55yrrcz
    ref_target_namespace: 4
        ref_target_title: Requests/Permissions/Hazard-SJ_(bureaucrat_2)
                ref_type: link
            ref_src_wiki: NULL
3 rows in set (0.00 sec)

These indeed appear not to exist in flow_workflow:

SELECT * FROM flow_workflow WHERE workflow_id = UNHEX('04ce8e3d46482f5a00e943');

I assume we at some point first calculated & stored these references (as PostRevision listeners) & then tried to (and failed to) actually store the workflow.

Backuped those on db1029:/home/jynus/flow_strange_records and about to run:

DELETE from flow_wiki_ref where (ref_src_wiki is null or ref_src_wiki = '') and ref_src_namespace=2600 and ref_src_title = 'Qaakkkf3x55yrrcz' and HEX(ref_src_workflow_id) = '04CE8E3D46482F5A00E943';

Current state of tables:

mysql> SHOW CREATE TABLE flow_ext_ref\G
*************************** 1. row ***************************
       Table: flow_ext_ref
Create Table: CREATE TABLE `flow_ext_ref` (
  `ref_src_object_id` binary(11) NOT NULL,
  `ref_src_object_type` varbinary(32) NOT NULL,
  `ref_src_workflow_id` binary(11) NOT NULL,
  `ref_src_namespace` int(11) NOT NULL,
  `ref_src_title` varbinary(255) NOT NULL,
  `ref_target` blob,
  `ref_type` varbinary(16) NOT NULL,
  `ref_src_wiki` varbinary(16) NOT NULL,
  KEY `flow_ext_ref_idx_v2` (`ref_src_wiki`,`ref_src_namespace`,`ref_src_title`,`ref_type`,`ref_target`(255),`ref_src_object_type`,`ref_src_object_id`),
  KEY `flow_ext_ref_revision_v2` (`ref_src_wiki`,`ref_src_namespace`,`ref_src_title`,`ref_src_object_type`,`ref_src_object_id`,`ref_type`,`ref_target`(255)),
  KEY `flow_ext_ref_workflow_id_idx_tmp` (`ref_src_workflow_id`,`ref_src_wiki`),
  KEY `flow_ext_ref_idx` (`ref_src_namespace`,`ref_src_title`,`ref_type`,`ref_target`(100),`ref_src_object_type`,`ref_src_object_id`),
  KEY `flow_ext_ref_revision` (`ref_src_namespace`,`ref_src_title`,`ref_src_object_type`,`ref_src_object_id`,`ref_type`,`ref_target`(100))
) ENGINE=InnoDB DEFAULT CHARSET=binary
1 row in set (0.00 sec)

mysql> SHOW CREATE TABLE flow_wiki_ref\G
*************************** 1. row ***************************
       Table: flow_wiki_ref
Create Table: CREATE TABLE `flow_wiki_ref` (
  `ref_src_object_id` binary(11) NOT NULL,
  `ref_src_object_type` varbinary(32) NOT NULL,
  `ref_src_workflow_id` binary(11) NOT NULL,
  `ref_src_namespace` int(11) NOT NULL,
  `ref_src_title` varbinary(255) NOT NULL,
  `ref_target_namespace` int(11) NOT NULL,
  `ref_target_title` varbinary(255) NOT NULL,
  `ref_type` varbinary(16) NOT NULL,
  `ref_src_wiki` varbinary(16) NOT NULL,
  KEY `flow_wiki_ref_idx_v2` (`ref_src_wiki`,`ref_src_namespace`,`ref_src_title`,`ref_type`,`ref_target_namespace`,`ref_target_title`,`ref_src_object_type`,`ref_src_object_id`),
  KEY `flow_wiki_ref_revision_v2` (`ref_src_wiki`,`ref_src_namespace`,`ref_src_title`,`ref_src_object_type`,`ref_src_object_id`,`ref_type`,`ref_target_namespace`,`ref_target_title`),
  KEY `flow_wiki_ref_workflow_id_idx_tmp` (`ref_src_workflow_id`,`ref_src_wiki`),
  KEY `flow_wiki_ref_idx` (`ref_src_namespace`,`ref_src_title`,`ref_type`,`ref_target_namespace`,`ref_target_title`,`ref_src_object_type`,`ref_src_object_id`),
  KEY `flow_wiki_ref_revision` (`ref_src_namespace`,`ref_src_title`,`ref_src_object_type`,`ref_src_object_id`,`ref_type`,`ref_target_namespace`,`ref_target_title`)
) ENGINE=InnoDB DEFAULT CHARSET=binary
1 row in set (0.00 sec)

We haven't yet turned off $wgFlowMigrateReferenceWiki. What we did do is cause a 12-minute outage on the x1 cluster by asking for the old indexes to be removed before $wgFlowMigrateReferenceWiki was turned off. We should not have asked for indexes to be dropped before checking it was safe to do so.

What I think we should do now is:

In addition, we need to fix T109676: Add artificial primary key to flow_wiki_ref and flow_ext_ref because the lack of primary keys on these tables made altering them (and cleaning up the dead rows discussed above) more difficult.

Current officewiki status:

mysql> SHOW CREATE TABLE flow_ext_ref\G
*************************** 1. row ***************************
       Table: flow_ext_ref
Create Table: CREATE TABLE `flow_ext_ref` (
  `ref_src_object_id` binary(11) NOT NULL,
  `ref_src_object_type` varbinary(32) NOT NULL,
  `ref_src_workflow_id` binary(11) NOT NULL,
  `ref_src_namespace` int(11) NOT NULL,
  `ref_src_title` varbinary(255) NOT NULL,
  `ref_target` blob,
  `ref_type` varbinary(16) NOT NULL,
  `ref_src_wiki` varbinary(16) NOT NULL,
  KEY `flow_ext_ref_idx_v2` (`ref_src_wiki`,`ref_src_namespace`,`ref_src_title`,`ref_type`,`ref_target`(255),`ref_src_object_type`,`ref_src_object_id`),
  KEY `flow_ext_ref_revision_v2` (`ref_src_wiki`,`ref_src_namespace`,`ref_src_title`,`ref_src_object_type`,`ref_src_object_id`,`ref_type`,`ref_target`(255)),
  KEY `flow_ext_ref_idx` (`ref_src_namespace`,`ref_src_title`,`ref_type`,`ref_target`(100),`ref_src_object_type`,`ref_src_object_id`),
  KEY `flow_ext_ref_revision` (`ref_src_namespace`,`ref_src_title`,`ref_src_object_type`,`ref_src_object_id`,`ref_type`,`ref_target`(100))
) ENGINE=InnoDB DEFAULT CHARSET=binary
1 row in set (0.00 sec)

mysql> SHOW CREATE TABLE flow_wiki_ref\G
*************************** 1. row ***************************
       Table: flow_wiki_ref
Create Table: CREATE TABLE `flow_wiki_ref` (
  `ref_src_object_id` binary(11) NOT NULL,
  `ref_src_object_type` varbinary(32) NOT NULL,
  `ref_src_workflow_id` binary(11) NOT NULL,
  `ref_src_namespace` int(11) NOT NULL,
  `ref_src_title` varbinary(255) NOT NULL,
  `ref_target_namespace` int(11) NOT NULL,
  `ref_target_title` varbinary(255) NOT NULL,
  `ref_type` varbinary(16) NOT NULL,
  `ref_src_wiki` varbinary(16) NOT NULL,
  KEY `flow_wiki_ref_idx_v2` (`ref_src_wiki`,`ref_src_namespace`,`ref_src_title`,`ref_type`,`ref_target_namespace`,`ref_target_title`,`ref_src_object_type`,`ref_src_object_id`),
  KEY `flow_wiki_ref_revision_v2` (`ref_src_wiki`,`ref_src_namespace`,`ref_src_title`,`ref_src_object_type`,`ref_src_object_id`,`ref_type`,`ref_target_namespace`,`ref_target_title`),
  KEY `flow_wiki_ref_idx` (`ref_src_namespace`,`ref_src_title`,`ref_type`,`ref_target_namespace`,`ref_target_title`,`ref_src_object_type`,`ref_src_object_id`),
  KEY `flow_wiki_ref_revision` (`ref_src_namespace`,`ref_src_title`,`ref_src_object_type`,`ref_src_object_id`,`ref_type`,`ref_target_namespace`,`ref_target_title`)
) ENGINE=InnoDB DEFAULT CHARSET=binary
1 row in set (0.00 sec)

Change 238124 had a related patch set uploaded (by Matthias Mullie):
Update references migration to reflect production status

https://gerrit.wikimedia.org/r/238124

Change 238124 merged by jenkins-bot:
Update references migration to reflect production status

https://gerrit.wikimedia.org/r/238124

Verify the v2 indexes exist on the tables

The v2 indexes exist for flow_ext_ref and flow_wiki_ref tables:

[enwiki]> SHOW CREATE TABLE flow_wiki_ref\G
*************************** 1. row ***************************
       Table: flow_wiki_ref
Create Table: CREATE TABLE `flow_wiki_ref` (
  `ref_src_object_id` binary(11) NOT NULL,
  `ref_src_object_type` varbinary(32) NOT NULL,
  `ref_src_workflow_id` binary(11) NOT NULL,
  `ref_src_namespace` int(11) NOT NULL,
  `ref_src_title` varbinary(255) NOT NULL,
  `ref_target_namespace` int(11) NOT NULL,
  `ref_target_title` varbinary(255) NOT NULL,
  `ref_type` varbinary(16) NOT NULL,
  `ref_src_wiki` varchar(16) CHARACTER SET latin1 COLLATE latin1_bin NOT NULL,
  KEY `flow_wiki_ref_idx_v2`  (`ref_src_namespace`,`ref_src_title`,`ref_type`,`ref_target_namespace`,`ref_target_title`,`ref_src_object_type`,`ref_src_object_id`),
  KEY `flow_wiki_ref_revision_v2` (`ref_src_namespace`,`ref_src_title`,`ref_src_object_type`,`ref_src_object_id`,`ref_type`,`ref_target_namespace`,`ref_target_title`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
1 row in set (0.00 sec)

[enwiki]> SHOW CREATE TABLE flow_ext_ref\G
*************************** 1. row ***************************
       Table: flow_ext_ref
Create Table: CREATE TABLE `flow_ext_ref` (
  `ref_src_object_id` binary(11) NOT NULL,
  `ref_src_object_type` varbinary(32) NOT NULL,
  `ref_src_workflow_id` binary(11) NOT NULL,
  `ref_src_namespace` int(11) NOT NULL,
  `ref_src_title` varbinary(255) NOT NULL,
  `ref_target` blob,
  `ref_type` varbinary(16) NOT NULL,
  `ref_src_wiki` varchar(16) CHARACTER SET latin1 COLLATE latin1_bin NOT NULL,
  KEY `flow_ext_ref_idx_v2` (`ref_src_wiki`,`ref_src_namespace`,`ref_src_title`,`ref_type`,`ref_target`(255),`ref_src_object_type`,`ref_src_object_id`),
  KEY `flow_ext_ref_revision_v2` (`ref_src_wiki`,`ref_src_namespace`,`ref_src_title`,`ref_src_object_type`,`ref_src_object_id`,`ref_type`,`ref_target`(255))
) ENGINE=InnoDB DEFAULT CHARSET=latin1
1 row in set (0.00 sec)
NOTE: In betalabs, enwiki flow_wiki_ref table does not have ref_src_wiki column in v2 indices. Other dbs, e.g. testwiki, cawiki have it.

Checked in betalabs. And T113525: [betalabs] enwiki - ref_src_wiki column is missing from v2 index. - is Closed.

DannyH claimed this task.
Mattflaschen-WMF removed DannyH as the assignee of this task.

From production slaves:

Got credentials using:

$factory = Flow\Container::get( 'db.factory' );
$slave = $factory->getDB( DB_SLAVE );
$master = $factory->getDB( DB_MASTER );

and var_export.

Slaves:

10.64.16.24 [officewiki]> SELECT * FROM INFORMATION_SCHEMA.INDEX_STATISTICS WHERE table_schema = 'officewiki' AND table_name LIKE 'flow_%_ref' ORDER BY TABLE_NAME, ROWS_READ;
+--------------+---------------+----------------------+-----------+
| TABLE_SCHEMA | TABLE_NAME    | INDEX_NAME           | ROWS_READ |
+--------------+---------------+----------------------+-----------+
| officewiki   | flow_ext_ref  | flow_ext_ref_pk      |         5 |
| officewiki   | flow_ext_ref  | flow_ext_ref_idx_v2  |       724 |
| officewiki   | flow_wiki_ref | flow_wiki_ref_idx_v2 |      1558 |
+--------------+---------------+----------------------+-----------+
3 rows in set (0.76 sec)
10.64.16.20 [flowdb]> SELECT * FROM INFORMATION_SCHEMA.INDEX_STATISTICS WHERE table_schema = 'flowdb' AND table_name LIKE 'flow_%_ref' ORDER BY TABLE_NAME, ROWS_READ;
+--------------+---------------+-----------------------------------+-----------+
| TABLE_SCHEMA | TABLE_NAME    | INDEX_NAME                        | ROWS_READ |
+--------------+---------------+-----------------------------------+-----------+
| flowdb       | flow_ext_ref  | flow_ext_ref_revision_v2          |      1070 |
| flowdb       | flow_ext_ref  | flow_ext_ref_pk                   |      1124 |
| flowdb       | flow_ext_ref  | flow_ext_ref_idx                  |      2709 |
| flowdb       | flow_ext_ref  | flow_ext_ref_idx_v2               |     24016 |
| flowdb       | flow_ext_ref  | flow_ext_ref_workflow_id_idx_tmp  |    425314 |
| flowdb       | flow_wiki_ref | flow_wiki_ref_revision_v2         |      4791 |
| flowdb       | flow_wiki_ref | flow_wiki_ref_pk                  |     13865 |
| flowdb       | flow_wiki_ref | flow_wiki_ref_idx                 |    107759 |
| flowdb       | flow_wiki_ref | flow_wiki_ref_idx_v2              |    419098 |
| flowdb       | flow_wiki_ref | flow_wiki_ref_workflow_id_idx_tmp |    726773 |
+--------------+---------------+-----------------------------------+-----------+
10 rows in set (0.04 sec)

Masters:

10.64.16.27 [officewiki]> SELECT * FROM INFORMATION_SCHEMA.INDEX_STATISTICS WHERE table_schema = 'officewiki' AND table_name LIKE 'flow_%_ref' ORDER BY TABLE_NAME, ROWS_READ;
+--------------+---------------+----------------------+-----------+
| TABLE_SCHEMA | TABLE_NAME    | INDEX_NAME           | ROWS_READ |
+--------------+---------------+----------------------+-----------+
| officewiki   | flow_ext_ref  | flow_ext_ref_idx     |        25 |
| officewiki   | flow_ext_ref  | flow_ext_ref_idx_v2  |       731 |
| officewiki   | flow_ext_ref  | flow_ext_ref_pk      |      4819 |
| officewiki   | flow_wiki_ref | flow_wiki_ref_idx    |       115 |
| officewiki   | flow_wiki_ref | flow_wiki_ref_idx_v2 |      1560 |
| officewiki   | flow_wiki_ref | flow_wiki_ref_pk     |      5456 |
+--------------+---------------+----------------------+-----------+
6 rows in set (0.88 sec)
10.64.16.18 [flowdb]> SELECT * FROM INFORMATION_SCHEMA.INDEX_STATISTICS WHERE table_schema = 'flowdb' AND table_name LIKE 'flow_%_ref' ORDER BY TABLE_NAME, ROWS_READ;
+--------------+---------------+-----------------------------------+-------------+
| TABLE_SCHEMA | TABLE_NAME    | INDEX_NAME                        | ROWS_READ   |
+--------------+---------------+-----------------------------------+-------------+
| flowdb       | flow_ext_ref  | flow_ext_ref_revision_v2          |         212 |
| flowdb       | flow_ext_ref  | flow_ext_ref_revision             |       10802 |
| flowdb       | flow_ext_ref  | flow_ext_ref_idx_v2               |       28911 |
| flowdb       | flow_ext_ref  | flow_ext_ref_idx                  |      228981 |
| flowdb       | flow_ext_ref  | flow_ext_ref_workflow_id_idx_tmp  |      399127 |
| flowdb       | flow_ext_ref  | flow_ext_ref_pk                   | 10858770317 |
| flowdb       | flow_wiki_ref | flow_wiki_ref_revision_v2         |         486 |
| flowdb       | flow_wiki_ref | flow_wiki_ref_revision            |       11775 |
| flowdb       | flow_wiki_ref | flow_wiki_ref_idx_v2              |       52515 |
| flowdb       | flow_wiki_ref | flow_wiki_ref_idx                 |      521266 |
| flowdb       | flow_wiki_ref | flow_wiki_ref_workflow_id_idx_tmp |      683450 |
| flowdb       | flow_wiki_ref | flow_wiki_ref_pk                  | 10859144643 |
+--------------+---------------+-----------------------------------+-------------+
12 rows in set (0.04 sec)

I guess we need to use this as a baseline then check back in a little while. The non-v2 versions should stay the same, and the v2 ones should increase.

But not sure how often these queries happen (I think it's mainly from updating the links tables on edit, plus two features, WhatLinksHere, and LinkSearch). For the non-master queries, this also relies on these particular slaves getting a reasonable proportion of traffic.

I guess we need to use this as a baseline then check back in a little while. The non-v2 versions should stay the same, and the v2 ones should increase.

Yes, that is the idea, let only "big numbers" (of the difference), talk. Unless something is broken, which then should be checked.

Hopefully, we will soon implement an alternative system, which also tell us the last time and where those were used and put it on a graph, and then it will be easier to see. It is on my large //TODO. That will take months, however, to be rolled in (it has many blockers).

Please create a subtask and assign it to me when ready to do the final ALTER.

Please create a subtask and assign it to me when ready to do the final ALTER.

I've created it. I'll assign it to you and add a note when it's ready.

How long do you think we should wait before checking back on these tables?

Unless there is no activity, a day should be more than enough. The connections filled up in minutes, so the activity was high. Check that the right indexes are being used, too (and user_stats is not broken).

Updated values:

Slaves:

10.64.16.24 [officewiki]> SELECT * FROM INFORMATION_SCHEMA.INDEX_STATISTICS WHERE table_schema = 'officewiki' AND table_name LIKE 'flow_%_ref' ORDER BY TABLE_NAME, ROWS_READ;
+--------------+---------------+----------------------+-----------+
| TABLE_SCHEMA | TABLE_NAME    | INDEX_NAME           | ROWS_READ |
+--------------+---------------+----------------------+-----------+
| officewiki   | flow_ext_ref  | flow_ext_ref_pk      |         5 |
| officewiki   | flow_ext_ref  | flow_ext_ref_idx_v2  |       724 |
| officewiki   | flow_wiki_ref | flow_wiki_ref_idx_v2 |      1569 |
+--------------+---------------+----------------------+-----------+
3 rows in set (0.78 sec)

Diff:

flow_ext_ref_pk - Same
flow_ext_ref_idx_v2 - Same
flow_wiki_ref_idx_v2 - 11

10.64.16.20 [flowdb]> SELECT * FROM INFORMATION_SCHEMA.INDEX_STATISTICS WHERE table_schema = 'flowdb' AND table_name LIKE 'flow_%_ref' ORDER BY TABLE_NAME, ROWS_READ;
+--------------+---------------+-----------------------------------+-----------+
| TABLE_SCHEMA | TABLE_NAME    | INDEX_NAME                        | ROWS_READ |
+--------------+---------------+-----------------------------------+-----------+
| flowdb       | flow_ext_ref  | flow_ext_ref_revision_v2          |      1070 |
| flowdb       | flow_ext_ref  | flow_ext_ref_pk                   |      1124 |
| flowdb       | flow_ext_ref  | flow_ext_ref_idx                  |      2709 |
| flowdb       | flow_ext_ref  | flow_ext_ref_idx_v2               |     24150 |
| flowdb       | flow_ext_ref  | flow_ext_ref_workflow_id_idx_tmp  |    425314 |
| flowdb       | flow_wiki_ref | flow_wiki_ref_revision_v2         |      4791 |
| flowdb       | flow_wiki_ref | flow_wiki_ref_pk                  |     13865 |
| flowdb       | flow_wiki_ref | flow_wiki_ref_idx                 |    199241 |
| flowdb       | flow_wiki_ref | flow_wiki_ref_idx_v2              |    466849 |
| flowdb       | flow_wiki_ref | flow_wiki_ref_workflow_id_idx_tmp |    726786 |
+--------------+---------------+-----------------------------------+-----------+
10 rows in set (0.04 sec)

Diff:

flow_ext_ref_revision_v2 -S - Same
flow_ext_ref_pk - Same
flow_ext_ref_idx - Same
flow_ext_ref_idx_v2 - 134
flow_ext_ref_workflow_id_idx_tmp - Same
flow_wiki_ref_revision_v2 - 0
flow_wiki_ref_pk - Same
flow_wiki_ref_idx - 91482 (?)
flow_wiki_ref_idx_v2 - 47751
flow_wiki_ref_workflow_id_idx_tmp - 13

Masters:

10.64.16.27 [officewiki]> SELECT * FROM INFORMATION_SCHEMA.INDEX_STATISTICS WHERE table_schema = 'officewiki' AND table_name LIKE 'flow_%_ref' ORDER BY TABLE_NAME, ROWS_READ;
+--------------+---------------+----------------------+-----------+
| TABLE_SCHEMA | TABLE_NAME    | INDEX_NAME           | ROWS_READ |
+--------------+---------------+----------------------+-----------+
| officewiki   | flow_ext_ref  | flow_ext_ref_idx     |        32 |
| officewiki   | flow_ext_ref  | flow_ext_ref_idx_v2  |      1015 |
| officewiki   | flow_ext_ref  | flow_ext_ref_pk      |      4819 |
| officewiki   | flow_wiki_ref | flow_wiki_ref_idx    |       120 |
| officewiki   | flow_wiki_ref | flow_wiki_ref_idx_v2 |      2216 |
| officewiki   | flow_wiki_ref | flow_wiki_ref_pk     |      5456 |
+--------------+---------------+----------------------+-----------+
6 rows in set (0.91 sec)

Diff:

flow_ext_ref_idx - 7
flow_ext_ref_idx_v2 - 284
flow_ext_ref_pk - Same
flow_wiki_ref_idx - 5
flow_wiki_ref_idx_v2 - 656
flow_wiki_ref_pk - Same

10.64.16.18 [flowdb]> SELECT * FROM INFORMATION_SCHEMA.INDEX_STATISTICS WHERE table_schema = 'flowdb' AND table_name LIKE 'flow_%_ref' ORDER BY TABLE_NAME, ROWS_READ;
+--------------+---------------+-----------------------------------+-------------+
| TABLE_SCHEMA | TABLE_NAME    | INDEX_NAME                        | ROWS_READ   |
+--------------+---------------+-----------------------------------+-------------+
| flowdb       | flow_ext_ref  | flow_ext_ref_revision_v2          |        3015 |
| flowdb       | flow_ext_ref  | flow_ext_ref_revision             |       10919 |
| flowdb       | flow_ext_ref  | flow_ext_ref_idx_v2               |       76522 |
| flowdb       | flow_ext_ref  | flow_ext_ref_idx                  |      232467 |
| flowdb       | flow_ext_ref  | flow_ext_ref_workflow_id_idx_tmp  |      399127 |
| flowdb       | flow_ext_ref  | flow_ext_ref_pk                   | 10858770317 |
| flowdb       | flow_wiki_ref | flow_wiki_ref_revision_v2         |        5127 |
| flowdb       | flow_wiki_ref | flow_wiki_ref_revision            |       13084 |
| flowdb       | flow_wiki_ref | flow_wiki_ref_idx_v2              |      140192 |
| flowdb       | flow_wiki_ref | flow_wiki_ref_idx                 |      525184 |
| flowdb       | flow_wiki_ref | flow_wiki_ref_workflow_id_idx_tmp |      683450 |
| flowdb       | flow_wiki_ref | flow_wiki_ref_pk                  | 10859144643 |
+--------------+---------------+-----------------------------------+-------------+
12 rows in set (0.05 sec)

Diff:

flow_ext_ref_revision_v2 - 2803
flow_ext_ref_revision - 117
flow_ext_ref_idx_v2 - 47611
flow_ext_ref_idx - 3486 (?)
flow_ext_ref_workflow_id_idx_tmp - Same
flow_ext_ref_pk - Same
flow_wiki_ref_revision_v2 - 4641
flow_wiki_ref_revision - 1309 (?)
flow_wiki_ref_idx_v2 - 87677
flow_wiki_ref_idx - 3918 (?)
flow_wiki_ref_workflow_id_idx_tmp - Same
flow_wiki_ref_pk - Same

I put ? marks next to the ones that seem possibly anomalous. These are flow_wiki_ref_idx, flow_ext_ref_idx, and flow_wiki_ref_revision. In the meantime, we've merged rEFLWeabf334c3f9a: Get rid of $wgFlowMigrateReferenceWiki. But I just re-reviewed that, and I can't see anywhere that behavior would be different from just wgFlowMigrateReferenceWiki = false.

Is there any way to collect a sample of queries hitting each index?

I double-checked, and none of the old ? indices are defined wrong in production (at least on these tables). So the old ones shouldn't even be usable, and we need to figure out if these queries are (still) coming from our code.

Most of the ? marks are not that large proportionally, but flow_wiki_ref_idx on the 10.64.16.20 slave is:

(199241 - 107759)/107759 = .84

so that's the one I'm mainly concerned about.

After some days away, I will put this back on the backlog and check it personally.

These are the current stats I got (selecting only the indexes to be deleted):

mysql> SELECT * FROM INFORMATION_SCHEMA.INDEX_STATISTICS WHERE index_name 
IN ('flow_wiki_ref_idx', 'flow_wiki_ref_revision', 'flow_ext_ref_idx', 'flow_ext_ref_revision', 
'flow_wiki_ref_workflow_id_idx_tmp', 'flow_ext_ref_workflow_id_idx_tmp') ORDER BY ROWS_READ desc;
+--------------+---------------+-----------------------------------+-----------+
| TABLE_SCHEMA | TABLE_NAME    | INDEX_NAME                        | ROWS_READ |
+--------------+---------------+-----------------------------------+-----------+
| flowdb       | flow_wiki_ref | flow_wiki_ref_workflow_id_idx_tmp |    683450 |
| flowdb       | flow_wiki_ref | flow_wiki_ref_idx                 |    528434 |
| flowdb       | flow_ext_ref  | flow_ext_ref_workflow_id_idx_tmp  |    399127 |
| flowdb       | flow_ext_ref  | flow_ext_ref_idx                  |    235186 |
| flowdb       | flow_wiki_ref | flow_wiki_ref_revision            |     14106 |
| flowdb       | flow_ext_ref  | flow_ext_ref_revision             |     10971 |
+--------------+---------------+-----------------------------------+-----------+
6 rows in set (0.01 sec)
While 3000 accesses may sound like a lot, that is over 2 weeks (~10/hour), while when the problem arised, we had like thousands per minute:
+--------------+---------------+-----------------------------------+-----------+
| TABLE_SCHEMA | TABLE_NAME    | INDEX_NAME                        |      DIFF |
+--------------+---------------+-----------------------------------+-----------+
| flowdb       | flow_wiki_ref | flow_wiki_ref_workflow_id_idx_tmp |         0 |
| flowdb       | flow_wiki_ref | flow_wiki_ref_idx                 |      3250 |
| flowdb       | flow_ext_ref  | flow_ext_ref_workflow_id_idx_tmp  |         0 |
| flowdb       | flow_ext_ref  | flow_ext_ref_idx                  |      2719 |
| flowdb       | flow_wiki_ref | flow_wiki_ref_revision            |      1022 |
| flowdb       | flow_ext_ref  | flow_ext_ref_revision             |        52 |
+--------------+---------------+-----------------------------------+-----------+

My recommendation is to proceed with the last step of the migration, then profile the server to find full table scans and avoid them.

Also please note that there could be some indexes that are duplicate/have the same efficiency and are still used even if they are not necessary.

Let's schedule a date to proceed with it.

I've applied the change to flowdb schema. This is the new structure of the tables:

mysql> SHOW CREATE TABLE flow_ext_ref; SHOW CREATE TABLE flow_wiki_ref;
CREATE TABLE `flow_ext_ref` (
  `ref_src_object_id` binary(11) NOT NULL,
  `ref_src_object_type` varbinary(32) NOT NULL,
  `ref_src_workflow_id` binary(11) NOT NULL,
  `ref_src_namespace` int(11) NOT NULL,
  `ref_src_title` varbinary(255) NOT NULL,
  `ref_target` blob,
  `ref_type` varbinary(16) NOT NULL,
  `ref_src_wiki` varbinary(16) NOT NULL,
  KEY `flow_ext_ref_idx_v2` (`ref_src_wiki`,`ref_src_namespace`,`ref_src_title`,`ref_type`,`ref_target`(255),`ref_src_object_type`,`ref_src_object_id`),
  KEY `flow_ext_ref_revision_v2` (`ref_src_wiki`,`ref_src_namespace`,`ref_src_title`,`ref_src_object_type`,`ref_src_object_id`,`ref_type`,`ref_target`(255))
) ENGINE=InnoDB DEFAULT CHARSET=binary

CREATE TABLE `flow_wiki_ref` (
  `ref_src_object_id` binary(11) NOT NULL,
  `ref_src_object_type` varbinary(32) NOT NULL,
  `ref_src_workflow_id` binary(11) NOT NULL,
  `ref_src_namespace` int(11) NOT NULL,
  `ref_src_title` varbinary(255) NOT NULL,
  `ref_target_namespace` int(11) NOT NULL,
  `ref_target_title` varbinary(255) NOT NULL,
  `ref_type` varbinary(16) NOT NULL,
  `ref_src_wiki` varbinary(16) NOT NULL,
  KEY `flow_wiki_ref_idx_v2` (`ref_src_wiki`,`ref_src_namespace`,`ref_src_title`,`ref_type`,`ref_target_namespace`,`ref_target_title`,`ref_src_object_type`,`ref_src_object_id`),
  KEY `flow_wiki_ref_revision_v2` (`ref_src_wiki`,`ref_src_namespace`,`ref_src_title`,`ref_src_object_type`,`ref_src_object_id`,`ref_type`,`ref_target_namespace`,`ref_target_title`)
) ENGINE=InnoDB DEFAULT CHARSET=binary

I've seen no regressions this time, although I am not 100% convinced about the current performance. A performance audit is suggested in the future.

This is officewiki now:

CREATE TABLE `flow_ext_ref` (
  `ref_src_object_id` binary(11) NOT NULL,
  `ref_src_object_type` varbinary(32) NOT NULL,
  `ref_src_workflow_id` binary(11) NOT NULL,
  `ref_src_namespace` int(11) NOT NULL,
  `ref_src_title` varbinary(255) NOT NULL,
  `ref_target` blob,
  `ref_type` varbinary(16) NOT NULL,
  `ref_src_wiki` varbinary(16) NOT NULL,
  KEY `flow_ext_ref_idx_v2` (`ref_src_wiki`,`ref_src_namespace`,`ref_src_title`,`ref_type`,`ref_target`(255),`ref_src_object_type`,`ref_src_object_id`),
  KEY `flow_ext_ref_revision_v2` (`ref_src_wiki`,`ref_src_namespace`,`ref_src_title`,`ref_src_object_type`,`ref_src_object_id`,`ref_type`,`ref_target`(255))
) ENGINE=InnoDB DEFAULT CHARSET=binary

CREATE TABLE `flow_wiki_ref` (
  `ref_src_object_id` binary(11) NOT NULL,
  `ref_src_object_type` varbinary(32) NOT NULL,
  `ref_src_workflow_id` binary(11) NOT NULL,
  `ref_src_namespace` int(11) NOT NULL,
  `ref_src_title` varbinary(255) NOT NULL,
  `ref_target_namespace` int(11) NOT NULL,
  `ref_target_title` varbinary(255) NOT NULL,
  `ref_type` varbinary(16) NOT NULL,
  `ref_src_wiki` varbinary(16) NOT NULL,
  KEY `flow_wiki_ref_idx_v2` (`ref_src_wiki`,`ref_src_namespace`,`ref_src_title`,`ref_type`,`ref_target_namespace`,`ref_target_title`,`ref_src_object_type`,`ref_src_object_id`),
  KEY `flow_wiki_ref_revision_v2` (`ref_src_wiki`,`ref_src_namespace`,`ref_src_title`,`ref_src_object_type`,`ref_src_object_id`,`ref_type`,`ref_target_namespace`,`ref_target_title`)
) ENGINE=InnoDB DEFAULT CHARSET=binary
jcrespo updated the task description. (Show Details)

I've seen no regressions this time, although I am not 100% convinced about the current performance. A performance audit is suggested in the future.

Thanks, @jcrespo.

@matthiasmullie is looking at phasing out our use of memcached as a DB cache (https://gerrit.wikimedia.org/r/#/c/247575/ and associated bugs). If this works, it will involve optimizing some DB queries.

  1. General regression on test.wikipedia.org
  2. Recent stats - slaves:
mysql:research@x1-analytics-slave [flowdb]> SELECT * FROM INFORMATION_SCHEMA.INDEX_STATISTICS WHERE index_name  IN ('flow_wiki_ref_idx', 'flow_wiki_ref_revision', 'flow_ext_ref_idx', 'flow_ext_ref_revision',  'flow_wiki_ref_workflow_id_idx_tmp', 'flow_ext_ref_workflow_id_idx_tmp') ORDER BY ROWS_READ desc;
+--------------+---------------+-----------------------------------+-----------+
| TABLE_SCHEMA | TABLE_NAME    | INDEX_NAME                        | ROWS_READ |
+--------------+---------------+-----------------------------------+-----------+
| flowdb       | flow_wiki_ref | flow_wiki_ref_workflow_id_idx_tmp |    726786 |
| flowdb       | flow_ext_ref  | flow_ext_ref_workflow_id_idx_tmp  |    425314 |
| flowdb       | flow_wiki_ref | flow_wiki_ref_idx                 |    199241 |
| flowdb       | flow_ext_ref  | flow_ext_ref_idx                  |      2709 |
+--------------+---------------+-----------------------------------+-----------+
4 rows in set (0.03 sec)
mysql:research@s3-analytics-slave [officewiki]> SELECT * FROM INFORMATION_SCHEMA.INDEX_STATISTICS WHERE table_schema = 'officewiki' AND table_name LIKE 'flow_%_ref' ORDER BY TABLE_NAME, ROWS_READ;
+--------------+---------------+----------------------+-----------+
| TABLE_SCHEMA | TABLE_NAME    | INDEX_NAME           | ROWS_READ |
+--------------+---------------+----------------------+-----------+
| officewiki   | flow_ext_ref  | flow_ext_ref_pk      |         5 |
| officewiki   | flow_ext_ref  | flow_ext_ref_idx_v2  |       724 |
| officewiki   | flow_wiki_ref | flow_wiki_ref_idx_v2 |      1569 |
+--------------+---------------+----------------------+-----------+
3 rows in set (0.22 sec)