Page MenuHomePhabricator

Make ref_id of flow_wiki_ref and flow_ext_ref not null and PK (DBA phase 2)
Closed, ResolvedPublic

Description

After T117786: Run maintenance/FlowPopulateRefId.php on all production wikis is done, run db_patches/patch-ref_id-phase2.sql from https://gerrit.wikimedia.org/r/#/c/238393/ . At this point, that change will be merged.

This has to be done on flowdb on extension1, and officewiki on the standard cluster.

This is to ensure that all Flow tables have a PK; see T109676. I have tested that Flow works continually if the deployment plan given on that Gerrit is followed.

Event Timeline

I think this is blocked by another task, and not by the schema change, so it should not have (yet) #blocked-by-schema-change , but I may be wrong.

I think this is blocked by another task, and not by the schema change, so it should not have (yet) #blocked-by-schema-change , but I may be wrong.

Blocking task (population script) is ready. ref_id is now fully populated and unique:

mysql:research@x1-analytics-slave [flowdb]> select count(*), count(distinct ref_id) from flow_wiki_ref;
+----------+------------------------+
| count(*) | count(distinct ref_id) |
+----------+------------------------+
|   252062 |                 252062 |
+----------+------------------------+
1 row in set (0.42 sec)

mysql:research@x1-analytics-slave [flowdb]> select count(*), count(distinct ref_id) from flow_ext_ref;
+----------+------------------------+
| count(*) | count(distinct ref_id) |
+----------+------------------------+
|    84657 |                  84657 |
+----------+------------------------+
1 row in set (0.12 sec)

We should now be ready for ref_id to be made the primary key. Please do check the results of those queries again (the two numbers should be the same) before making the change, because I'm not completely confident that duplicates won't sneak back in. That won't be a concern after https://gerrit.wikimedia.org/r/259444 .

This needs to be done on officewiki too. Currently, that also looks ready:

mysql:research@s2-analytics-slave [officewiki]> select count(*), count(distinct ref_id) from flow_wiki_ref;
+----------+------------------------+
| count(*) | count(distinct ref_id) |
+----------+------------------------+
|     1390 |                   1390 |
+----------+------------------------+
1 row in set (0.14 sec)

mysql:research@s2-analytics-slave [officewiki]> select count(*), count(distinct ref_id) from flow_ext_ref;
+----------+------------------------+
| count(*) | count(distinct ref_id) |
+----------+------------------------+
|      535 |                    535 |
+----------+------------------------+
1 row in set (0.04 sec)

mysql:research@s2-analytics-slave [officewiki]> SELECT * FROM flow_wiki_ref WHERE ref_id IS NULL;
Empty set (0.01 sec)

mysql:research@s2-analytics-slave [officewiki]> SELECT * FROM flow_ext_ref WHERE ref_id IS NULL;
Empty set (0.01 sec)

Please check the current schema to validate the changes:

$ mysql -A -h s3-master officewiki -e "SHOW CREATE TABLE flow_wiki_ref\G"
*************************** 1. row ***************************
       Table: flow_wiki_ref
Create Table: CREATE TABLE `flow_wiki_ref` (
  `ref_src_object_id` binary(11) NOT NULL,
  `ref_src_object_type` varbinary(32) NOT NULL,
  `ref_src_workflow_id` binary(11) NOT NULL,
  `ref_src_namespace` int(11) NOT NULL,
  `ref_src_title` varbinary(255) NOT NULL,
  `ref_target_namespace` int(11) NOT NULL,
  `ref_target_title` varbinary(255) NOT NULL,
  `ref_type` varbinary(16) NOT NULL,
  `ref_src_wiki` varbinary(16) NOT NULL,
  `ref_id` binary(11) NOT NULL,
  PRIMARY KEY (`ref_id`),
  KEY `flow_wiki_ref_idx_v2` (`ref_src_wiki`,`ref_src_namespace`,`ref_src_title`,`ref_type`,`ref_target_namespace`,`ref_target_title`,`ref_src_object_type`,`ref_src_object_id`),
  KEY `flow_wiki_ref_revision_v2` (`ref_src_wiki`,`ref_src_namespace`,`ref_src_title`,`ref_src_object_type`,`ref_src_object_id`,`ref_type`,`ref_target_namespace`,`ref_target_title`)
) ENGINE=InnoDB DEFAULT CHARSET=binary
$ mysql -A -h s3-master officewiki -e "SHOW CREATE TABLE flow_ext_ref\G"
*************************** 1. row ***************************
       Table: flow_ext_ref
Create Table: CREATE TABLE `flow_ext_ref` (
  `ref_src_object_id` binary(11) NOT NULL,
  `ref_src_object_type` varbinary(32) NOT NULL,
  `ref_src_workflow_id` binary(11) NOT NULL,
  `ref_src_namespace` int(11) NOT NULL,
  `ref_src_title` varbinary(255) NOT NULL,
  `ref_target` blob,
  `ref_type` varbinary(16) NOT NULL,
  `ref_src_wiki` varbinary(16) NOT NULL,
  `ref_id` binary(11) NOT NULL,
  PRIMARY KEY (`ref_id`),
  KEY `flow_ext_ref_idx_v2` (`ref_src_wiki`,`ref_src_namespace`,`ref_src_title`,`ref_type`,`ref_target`(255),`ref_src_object_type`,`ref_src_object_id`),
  KEY `flow_ext_ref_revision_v2` (`ref_src_wiki`,`ref_src_namespace`,`ref_src_title`,`ref_src_object_type`,`ref_src_object_id`,`ref_type`,`ref_target`(255))
) ENGINE=InnoDB DEFAULT CHARSET=binary
$ mysql -A -h x1-master flowdb -e "SHOW CREATE TABLE flow_wiki_ref\G"
*************************** 1. row ***************************
       Table: flow_wiki_ref
Create Table: CREATE TABLE `flow_wiki_ref` (
  `ref_src_object_id` binary(11) NOT NULL,
  `ref_src_object_type` varbinary(32) NOT NULL,
  `ref_src_workflow_id` binary(11) NOT NULL,
  `ref_src_namespace` int(11) NOT NULL,
  `ref_src_title` varbinary(255) NOT NULL,
  `ref_target_namespace` int(11) NOT NULL,
  `ref_target_title` varbinary(255) NOT NULL,
  `ref_type` varbinary(16) NOT NULL,
  `ref_src_wiki` varbinary(16) NOT NULL,
  `ref_id` binary(11) NOT NULL,
  PRIMARY KEY (`ref_id`),
  KEY `flow_wiki_ref_idx_v2` (`ref_src_wiki`,`ref_src_namespace`,`ref_src_title`,`ref_type`,`ref_target_namespace`,`ref_target_title`,`ref_src_object_type`,`ref_src_object_id`),
  KEY `flow_wiki_ref_revision_v2` (`ref_src_wiki`,`ref_src_namespace`,`ref_src_title`,`ref_src_object_type`,`ref_src_object_id`,`ref_type`,`ref_target_namespace`,`ref_target_title`)
) ENGINE=InnoDB DEFAULT CHARSET=binary
$ mysql -A -h x1-master flowdb -e "SHOW CREATE TABLE flow_ext_ref\G"
*************************** 1. row ***************************
       Table: flow_ext_ref
Create Table: CREATE TABLE `flow_ext_ref` (
  `ref_src_object_id` binary(11) NOT NULL,
  `ref_src_object_type` varbinary(32) NOT NULL,
  `ref_src_workflow_id` binary(11) NOT NULL,
  `ref_src_namespace` int(11) NOT NULL,
  `ref_src_title` varbinary(255) NOT NULL,
  `ref_target` blob,
  `ref_type` varbinary(16) NOT NULL,
  `ref_src_wiki` varbinary(16) NOT NULL,
  `ref_id` binary(11) NOT NULL,
  PRIMARY KEY (`ref_id`),
  KEY `flow_ext_ref_idx_v2` (`ref_src_wiki`,`ref_src_namespace`,`ref_src_title`,`ref_type`,`ref_target`(255),`ref_src_object_type`,`ref_src_object_id`),
  KEY `flow_ext_ref_revision_v2` (`ref_src_wiki`,`ref_src_namespace`,`ref_src_title`,`ref_src_object_type`,`ref_src_object_id`,`ref_type`,`ref_target`(255))
) ENGINE=InnoDB DEFAULT CHARSET=binary

Is testwiki included on the change? I have not touched it.

Thanks for taking care of this. As you can see it already brought some improvements by making sure that there are no duplicate values. Also, it will allow easier schema changes from now on due to the presence of a primary key.

jcrespo moved this task from Backlog to Done on the Schema-change-in-production board.

Please check the current schema to validate the changes:

Thanks. This is unrelated, so I filed separately, but I noticed T122111: ref_target should be NOT NULL.

Is testwiki included on the change? I have not touched it.

For Flow, only private wikis are special (have their own Flow DB), and currently the only private wiki with Flow is officewiki. So in this case testwiki's Flow data is all in flowdb: https://github.com/wikimedia/operations-mediawiki-config/blob/master/wmf-config/InitialiseSettings.php#L14973 .

Thanks for taking care of this.

Thank you as well.