Add primary keys to remaining Flow tables
Closed, ResolvedPublic

Description

flow_subscription, flow_topic_list and flow_tree_node don't have primary keys, but each table has a unique key that can easily be converted to a primary key.

Catrope created this task.Nov 2 2016, 5:22 PM
Restricted Application added a project: Collaboration-Team-Triage. · View Herald TranscriptNov 2 2016, 5:22 PM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Change 319358 had a related patch set uploaded (by Catrope):
Add primary keys to the remaining Flow tables

https://gerrit.wikimedia.org/r/319358

jcrespo added a subscriber: jcrespo.

Thank you, this is actually important because even if internally, InnoDB treats those as PKs, from the SQL point of view that is not true, and that can lead to query issues.

I will wait for the code deployment to actually start working on the schema change.

Matt pointed out that the flow_subscription table is unused and should just be dropped entirely (I checked and it's empty on x1), so I took that one out of the patch. I'll submit a separate schema change to drop that table.

Matt pointed out that the flow_subscription table is unused and should just be dropped entirely (I checked and it's empty on x1), so I took that one out of the patch. I'll submit a separate schema change to drop that table.

This is T149936: Drop flow_subscription table

Change 319358 merged by jenkins-bot:
Add primary keys to the remaining Flow tables

https://gerrit.wikimedia.org/r/319358

The code is now deployed, and this is ready to go.

I will take this ticket as I am about to finish another schema change tomorrow. However due to the xmas times, this will likely will be started in January

Jaime pointed out that the tables normally quite small and he's totally right:

root@db1031:/srv/sqldata/flowdb# ls -lh flow_topic_list.ibd
-rw-rw---- 1 mysql mysql 23M Dec 14 14:10 flow_topic_list.ibd
root@db1031:/srv/sqldata/flowdb# ls -lh flow_tree_node.ibd
-rw-rw---- 1 mysql mysql 120M Dec 14 14:25 flow_tree_node.ibd

So I might just execute this on the master and let it replicate.

I think by next week we can start deploying this change.
As the tables are rather small, it is probably safer to deploy it on the master (db1031) and let it replicate.

This would be it:

alter table flow_topic_list drop key flow_topic_list_pk, ADD PRIMARY KEY (topic_list_id, topic_id);

show create table flow_topic_list\G
*************************** 1. row ***************************
       Table: flow_topic_list
Create Table: CREATE TABLE `flow_topic_list` (
  `topic_list_id` binary(11) NOT NULL,
  `topic_id` binary(11) NOT NULL DEFAULT '\0\0\0\0\0\0\0\0\0\0\0',
  PRIMARY KEY (`topic_list_id`,`topic_id`),
  KEY `flow_topic_list_topic_id` (`topic_id`)
) ENGINE=InnoDB DEFAULT CHARSET=binary
1 row in set (0.00 sec)


alter table flow_tree_node drop key flow_tree_node_pk, ADD PRIMARY KEY (tree_ancestor_id, tree_descendant_id);
show create table flow_tree_node\G
*************************** 1. row ***************************
       Table: flow_tree_node
Create Table: CREATE TABLE `flow_tree_node` (
  `tree_ancestor_id` binary(11) NOT NULL,
  `tree_descendant_id` binary(11) NOT NULL,
  `tree_depth` smallint(6) NOT NULL,
  PRIMARY KEY (`tree_ancestor_id`,`tree_descendant_id`),
  UNIQUE KEY `flow_tree_constraint` (`tree_descendant_id`,`tree_depth`)
) ENGINE=InnoDB DEFAULT CHARSET=binary

What would be the possible impact for users?

What would be the possible impact for users?

The table will be blocked until the ALTER is finished - meaning no writes will go thru.
The table is rather small so I don't think it will take long to finish the ALTER.

Once the alter goes thru on the master it will replicate on the slaves and it might cause some lag (again, not too big as the tables are small), but during that lag, information read from the slaves might be slightly outdated.

Thanks @Marostegui.
How can I explain that? "Some Flow databases are going to be changed, that may cause some slowdowns [+ add date]"? I'm asking, because I'm currently writing the next team newsletter and I prefer to inform people is a change is going to affect them.

I just did a test to give you more detailed data about how long the change will take and the impact.
On the smallest table (20MB) the alter took: 0.59 second
On the biggest table (120MB) the alter took: 4.43 seconds

And those numbers are from a virtual machine which is way less powerful than our production hosts :-)

So the impact will be almost 0, so you might want to mention that at some point during next week we will do a small maintenance on the flow database but there is no impact expected.
Would that be enough for you and the team?

So the impact will be almost 0, so you might want to mention that at some point during next week we will do a small maintenance on the flow database but there is no impact expected.
Would that be enough for you and the team?

OK, so no visible impact - cool!
However, I'll add a quick note to show people that we still maintain Flow (and as a safety net if something goes wrong; that's not mean I don't trust anyone skills :))

When do you plan to perform that change?

When do you plan to perform that change?

Not sure yet, probably early next week. I can update the ticket on Monday or so with some more detailed timing if that works for you.
It will be done some day early in UTC mornings.

Not sure yet, probably early next week. I can update the ticket on Monday or so with some more detailed timing if that works for you.

I plan to send my newsletter on Monday morning. I don't want to bother you, so I can cheat and just say "this week".

Not sure yet, probably early next week. I can update the ticket on Monday or so with some more detailed timing if that works for you.

I plan to send my newsletter on Monday morning. I don't want to bother you, so I can cheat and just say "this week".

Haha cool :-) You can say some day in UTC morning :-)
Thanks!

I have set up the flowdb (with the current data) on a test environment to make sure that adding the PK on the master and then inserting values after it won't break replication (because of inconsistencies).
It all went fine, so I think it is safe to go ahead and add them on the master.

I have taken a backup of flowdb - db1031:/srv/tmp/flowdb_23_jan.sql and going to start altering the tables now.

Mentioned in SAL (#wikimedia-operations) [2017-01-23T15:38:05Z] <marostegui> Alter tables: flow_topic_list and flow_tree_node on db1031 (x1 master) - T149819

Both tables have been successfully altered

root@PRODUCTION x1[flowdb]> show create table flow_topic_list\G
*************************** 1. row ***************************
       Table: flow_topic_list
Create Table: CREATE TABLE `flow_topic_list` (
  `topic_list_id` binary(11) NOT NULL,
  `topic_id` binary(11) DEFAULT NULL,
  UNIQUE KEY `flow_topic_list_pk` (`topic_list_id`,`topic_id`),
  KEY `flow_topic_list_topic_id` (`topic_id`)
) ENGINE=InnoDB DEFAULT CHARSET=binary
1 row in set (0.00 sec)

root@PRODUCTION x1[flowdb]> alter table flow_topic_list drop key flow_topic_list_pk, ADD PRIMARY KEY (topic_list_id, topic_id);
Query OK, 100371 rows affected (1.69 sec)
Records: 100371  Duplicates: 0  Warnings: 0

root@PRODUCTION x1[flowdb]> show create table flow_topic_list\G
*************************** 1. row ***************************
       Table: flow_topic_list
Create Table: CREATE TABLE `flow_topic_list` (
  `topic_list_id` binary(11) NOT NULL,
  `topic_id` binary(11) NOT NULL DEFAULT '\0\0\0\0\0\0\0\0\0\0\0',
  PRIMARY KEY (`topic_list_id`,`topic_id`),
  KEY `flow_topic_list_topic_id` (`topic_id`)
) ENGINE=InnoDB DEFAULT CHARSET=binary
1 row in set (0.00 sec)

root@PRODUCTION x1[flowdb]> show create table flow_tree_node\G
*************************** 1. row ***************************
       Table: flow_tree_node
Create Table: CREATE TABLE `flow_tree_node` (
  `tree_ancestor_id` binary(11) NOT NULL,
  `tree_descendant_id` binary(11) NOT NULL,
  `tree_depth` smallint(6) NOT NULL,
  UNIQUE KEY `flow_tree_node_pk` (`tree_ancestor_id`,`tree_descendant_id`),
  UNIQUE KEY `flow_tree_constraint` (`tree_descendant_id`,`tree_depth`)
) ENGINE=InnoDB DEFAULT CHARSET=binary
1 row in set (0.00 sec)

root@PRODUCTION x1[flowdb]> alter table flow_tree_node drop key flow_tree_node_pk, ADD PRIMARY KEY (tree_ancestor_id, tree_descendant_id);
Query OK, 0 rows affected (8.62 sec)
Records: 0  Duplicates: 0  Warnings: 0

root@PRODUCTION x1[flowdb]> show create table flow_tree_node\G
*************************** 1. row ***************************
       Table: flow_tree_node
Create Table: CREATE TABLE `flow_tree_node` (
  `tree_ancestor_id` binary(11) NOT NULL,
  `tree_descendant_id` binary(11) NOT NULL,
  `tree_depth` smallint(6) NOT NULL,
  PRIMARY KEY (`tree_ancestor_id`,`tree_descendant_id`),
  UNIQUE KEY `flow_tree_constraint` (`tree_descendant_id`,`tree_depth`)
) ENGINE=InnoDB DEFAULT CHARSET=binary
1 row in set (0.00 sec)
Marostegui closed this task as Resolved.Jan 23 2017, 3:48 PM