Add primary keys to remaining Flow tables
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	Catrope
	Nov 2 2016, 5:22 PM

Description

flow_subscription, flow_topic_list and flow_tree_node don't have primary keys, but each table has a unique key that can easily be converted to a primary key.

Details

	Subject	Repo	Branch	Lines +/-
	Add primary keys to the remaining Flow tables	mediawiki/extensions/Flow	master	+13 -8

Customize query in gerrit

Related Objects
Search...

View Standalone Graph

This task is connected to more than 200 other tasks. Only direct parents and subtasks are shown here. Use View Standalone Graph to show more of the graph.

Status	Assigned	Task
		· · ·
Stalled	None	T108255 Enable MariaDB/MySQL's Strict Mode
Resolved	LSobanski	T17441 Some tables lack unique or primary keys, may allow confusing duplicate data
Resolved	Marostegui	T149819 Add primary keys to remaining Flow tables
		· · ·

Event Timeline

Catrope created this task.Nov 2 2016, 5:22 PM

Restricted Application added a project: Collaboration-Team-Triage. · View Herald TranscriptNov 2 2016, 5:22 PM

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Change 319358 had a related patch set uploaded (by Catrope):
Add primary keys to the remaining Flow tables

https://gerrit.wikimedia.org/r/319358

gerritbot added a project: Patch-For-Review.Nov 2 2016, 5:22 PM

SQL: https://gerrit.wikimedia.org/r/#/c/319358/1/db_patches/patch-primary-keys.sql

Thank you, this is actually important because even if internally, InnoDB treats those as PKs, from the SQL point of view that is not true, and that can lead to query issues.

I will wait for the code deployment to actually start working on the schema change.

Matt pointed out that the flow_subscription table is unused and should just be dropped entirely (I checked and it's empty on x1), so I took that one out of the patch. I'll submit a separate schema change to drop that table.

In T149819#2768660, @Catrope wrote:

Matt pointed out that the flow_subscription table is unused and should just be dropped entirely (I checked and it's empty on x1), so I took that one out of the patch. I'll submit a separate schema change to drop that table.

This is T149936: Drop flow_subscription table

Change 319358 merged by jenkins-bot:
Add primary keys to the remaining Flow tables

https://gerrit.wikimedia.org/r/319358

ReleaseTaggerBot added a project: MW-1.29-release (WMF-deploy-2016-11-29_(1.29.0-wmf.4)).Nov 28 2016, 1:00 AM

The code is now deployed, and this is ready to go.

I will take this ticket as I am about to finish another schema change tomorrow. However due to the xmas times, this will likely will be started in January

Jaime pointed out that the tables normally quite small and he's totally right:

root@db1031:/srv/sqldata/flowdb# ls -lh flow_topic_list.ibd
-rw-rw---- 1 mysql mysql 23M Dec 14 14:10 flow_topic_list.ibd
root@db1031:/srv/sqldata/flowdb# ls -lh flow_tree_node.ibd
-rw-rw---- 1 mysql mysql 120M Dec 14 14:25 flow_tree_node.ibd

So I might just execute this on the master and let it replicate.

Marostegui claimed this task.Dec 15 2016, 7:47 AM

I think by next week we can start deploying this change.
As the tables are rather small, it is probably safer to deploy it on the master (db1031) and let it replicate.

This would be it:

alter table flow_topic_list drop key flow_topic_list_pk, ADD PRIMARY KEY (topic_list_id, topic_id);

show create table flow_topic_list\G
*************************** 1. row ***************************
       Table: flow_topic_list
Create Table: CREATE TABLE `flow_topic_list` (
  `topic_list_id` binary(11) NOT NULL,
  `topic_id` binary(11) NOT NULL DEFAULT '\0\0\0\0\0\0\0\0\0\0\0',
  PRIMARY KEY (`topic_list_id`,`topic_id`),
  KEY `flow_topic_list_topic_id` (`topic_id`)
) ENGINE=InnoDB DEFAULT CHARSET=binary
1 row in set (0.00 sec)


alter table flow_tree_node drop key flow_tree_node_pk, ADD PRIMARY KEY (tree_ancestor_id, tree_descendant_id);
show create table flow_tree_node\G
*************************** 1. row ***************************
       Table: flow_tree_node
Create Table: CREATE TABLE `flow_tree_node` (
  `tree_ancestor_id` binary(11) NOT NULL,
  `tree_descendant_id` binary(11) NOT NULL,
  `tree_depth` smallint(6) NOT NULL,
  PRIMARY KEY (`tree_ancestor_id`,`tree_descendant_id`),
  UNIQUE KEY `flow_tree_constraint` (`tree_descendant_id`,`tree_depth`)
) ENGINE=InnoDB DEFAULT CHARSET=binary

What would be the possible impact for users?

In T149819#2952815, @Trizek-WMF wrote:

What would be the possible impact for users?

The table will be blocked until the ALTER is finished - meaning no writes will go thru.
The table is rather small so I don't think it will take long to finish the ALTER.

Once the alter goes thru on the master it will replicate on the slaves and it might cause some lag (again, not too big as the tables are small), but during that lag, information read from the slaves might be slightly outdated.

Thanks @Marostegui.
How can I explain that? "Some Flow databases are going to be changed, that may cause some slowdowns [+ add date]"? I'm asking, because I'm currently writing the next team newsletter and I prefer to inform people is a change is going to affect them.

I just did a test to give you more detailed data about how long the change will take and the impact.
On the smallest table (20MB) the alter took: 0.59 second
On the biggest table (120MB) the alter took: 4.43 seconds

And those numbers are from a virtual machine which is way less powerful than our production hosts :-)

So the impact will be almost 0, so you might want to mention that at some point during next week we will do a small maintenance on the flow database but there is no impact expected.
Would that be enough for you and the team?

In T149819#2952881, @Marostegui wrote:

So the impact will be almost 0, so you might want to mention that at some point during next week we will do a small maintenance on the flow database but there is no impact expected.
Would that be enough for you and the team?

OK, so no visible impact - cool!
However, I'll add a quick note to show people that we still maintain Flow (and as a safety net if something goes wrong; that's not mean I don't trust anyone skills :))

When do you plan to perform that change?

Trizek-WMF moved this task from To Triage to Announce in next Collaboration Newsletter on the User-notice-collaboration board.Jan 19 2017, 4:32 PM

In T149819#2953178, @Trizek-WMF wrote:

When do you plan to perform that change?

Not sure yet, probably early next week. I can update the ticket on Monday or so with some more detailed timing if that works for you.
It will be done some day early in UTC mornings.

In T149819#2953190, @Marostegui wrote:

Not sure yet, probably early next week. I can update the ticket on Monday or so with some more detailed timing if that works for you.

I plan to send my newsletter on Monday morning. I don't want to bother you, so I can cheat and just say "this week".

In T149819#2953192, @Trizek-WMF wrote:

In T149819#2953190, @Marostegui wrote:

Not sure yet, probably early next week. I can update the ticket on Monday or so with some more detailed timing if that works for you.

I plan to send my newsletter on Monday morning. I don't want to bother you, so I can cheat and just say "this week".

Haha cool :-) You can say some day in UTC morning :-)
Thanks!

Liuxinyu970226 subscribed.Jan 20 2017, 6:03 AM

Trizek-WMF moved this task from Announce in next Collaboration Newsletter to In current Collaboration Newsletter draft on the User-notice-collaboration board.Jan 20 2017, 12:59 PM

I have set up the flowdb (with the current data) on a test environment to make sure that adding the PK on the master and then inserting values after it won't break replication (because of inconsistencies).
It all went fine, so I think it is safe to go ahead and add them on the master.

I have taken a backup of flowdb - db1031:/srv/tmp/flowdb_23_jan.sql and going to start altering the tables now.

Mentioned in SAL (#wikimedia-operations) [2017-01-23T15:38:05Z] <marostegui> Alter tables: flow_topic_list and flow_tree_node on db1031 (x1 master) - T149819

Both tables have been successfully altered

root@PRODUCTION x1[flowdb]> show create table flow_topic_list\G
*************************** 1. row ***************************
       Table: flow_topic_list
Create Table: CREATE TABLE `flow_topic_list` (
  `topic_list_id` binary(11) NOT NULL,
  `topic_id` binary(11) DEFAULT NULL,
  UNIQUE KEY `flow_topic_list_pk` (`topic_list_id`,`topic_id`),
  KEY `flow_topic_list_topic_id` (`topic_id`)
) ENGINE=InnoDB DEFAULT CHARSET=binary
1 row in set (0.00 sec)

root@PRODUCTION x1[flowdb]> alter table flow_topic_list drop key flow_topic_list_pk, ADD PRIMARY KEY (topic_list_id, topic_id);
Query OK, 100371 rows affected (1.69 sec)
Records: 100371  Duplicates: 0  Warnings: 0

root@PRODUCTION x1[flowdb]> show create table flow_topic_list\G
*************************** 1. row ***************************
       Table: flow_topic_list
Create Table: CREATE TABLE `flow_topic_list` (
  `topic_list_id` binary(11) NOT NULL,
  `topic_id` binary(11) NOT NULL DEFAULT '\0\0\0\0\0\0\0\0\0\0\0',
  PRIMARY KEY (`topic_list_id`,`topic_id`),
  KEY `flow_topic_list_topic_id` (`topic_id`)
) ENGINE=InnoDB DEFAULT CHARSET=binary
1 row in set (0.00 sec)

root@PRODUCTION x1[flowdb]> show create table flow_tree_node\G
*************************** 1. row ***************************
       Table: flow_tree_node
Create Table: CREATE TABLE `flow_tree_node` (
  `tree_ancestor_id` binary(11) NOT NULL,
  `tree_descendant_id` binary(11) NOT NULL,
  `tree_depth` smallint(6) NOT NULL,
  UNIQUE KEY `flow_tree_node_pk` (`tree_ancestor_id`,`tree_descendant_id`),
  UNIQUE KEY `flow_tree_constraint` (`tree_descendant_id`,`tree_depth`)
) ENGINE=InnoDB DEFAULT CHARSET=binary
1 row in set (0.00 sec)

root@PRODUCTION x1[flowdb]> alter table flow_tree_node drop key flow_tree_node_pk, ADD PRIMARY KEY (tree_ancestor_id, tree_descendant_id);
Query OK, 0 rows affected (8.62 sec)
Records: 0  Duplicates: 0  Warnings: 0

root@PRODUCTION x1[flowdb]> show create table flow_tree_node\G
*************************** 1. row ***************************
       Table: flow_tree_node
Create Table: CREATE TABLE `flow_tree_node` (
  `tree_ancestor_id` binary(11) NOT NULL,
  `tree_descendant_id` binary(11) NOT NULL,
  `tree_depth` smallint(6) NOT NULL,
  PRIMARY KEY (`tree_ancestor_id`,`tree_descendant_id`),
  UNIQUE KEY `flow_tree_constraint` (`tree_descendant_id`,`tree_depth`)
) ENGINE=InnoDB DEFAULT CHARSET=binary
1 row in set (0.00 sec)

Marostegui closed this task as Resolved.Jan 23 2017, 3:48 PM

jcrespo awarded a token.Jan 23 2017, 3:54 PM

jcrespo added a parent task: T17441: Some tables lack unique or primary keys, may allow confusing duplicate data.

Trizek-WMF awarded a token.Jan 23 2017, 5:53 PM

Liuxinyu970226 unsubscribed.Jan 25 2017, 10:21 AM

Trizek-WMF moved this task from In current Collaboration Newsletter draft to Announced in Collaboration Newsletter on the User-notice-collaboration board.Jan 25 2017, 7:38 PM