Page MenuHomePhabricator

Apply wikishared.cx_translations index change
Closed, ResolvedPublic

Description

The ALTER TABLEs to run (usually, a link to a commit diff is the best way):

https://gerrit.wikimedia.org/r/#/c/337409/2/sql/patch-update-cx-unique-index.sql

Where to run those changes (the databases, dblist file or a very specific description such as "all wikis with the table X")

On wikishared

When to run those changes (if it depends on a particular code deployment or can be done at any time)

Any time

If the schema change is backwards compatible (is compatible with the current code deployed and can be performed at any time now)

By now, the currently deployed code works with and without this change.

If the schema change has been tested already on some of the test/beta wikis. Usually, as a last test, change should be applied to testwiki first.

Tested on deployment prep and own wikis

If it involves new columns or tables, if the data should be made available on the labs replicas and/or dumps or not because they contain private or sensitive data (consult legal if you are unsure). Similar question if it involves deletion of data previously available on labs.

Not applicable.

Event Timeline

jcrespo moved this task from Triage to In progress on the DBA board.

This needs to be run on x1.
The table isn't too big, so probably can be run on the master directly.

-rw-rw---- 1 mysql mysql 156M Mar 14 13:41 cx_translations.ibd
root@PRODUCTION x1[wikishared]> select count(*) from cx_translations;
+----------+
| count(*) |
+----------+
|   255843 |
+----------+
1 row in set (0.09 sec)
jcrespo subscribed.

You can take it if you want. This would be a great opportunity to test domain-id-based schema changes.

You can take it if you want. This would be a great opportunity to test domain-id-based schema changes.

Up to you, the comment was more just to clarify what I saw about the table and schema situation.
But indeed, we can try to test that!

So, tomorrow morning I will run the following on all the x1 slaves:

stop slave; set global slave_parallel_threads = 5; start slave;

On the master (db1031) I will run:

show global variables like 'gtid_domain_id';
+----------------+-----------+
| Variable_name  | Value     |
+----------------+-----------+
| gtid_domain_id | 171970580 |
+----------------+-----------+
1 row in set (0.00 sec)

set session gtid_domain_id = 971970580

There is not such gtid_domain_id in production

And finally on db1031:

alter table cx_translations drop index cx_translation_pair, add UNIQUE INDEX cx_translation_ref (     translation_source_title,     translation_source_language,     translation_target_language,     translation_started_by );

The plan would be running:

./software/dbtools/osc_host.sh --host=db1031.codfw.wmnet --db=wikishared --table=cx_translations --replicate --gtid_domain_id=1 --method=ddl "DROP INDEX cx_translation_pair, ADD UNIQUE INDEX cx_translations (translation_source_title, translation_source_language, translation_target_language, translation_started_by)"

the --gtid_domain_id option does not exist, I am adding it now. Is gtid_domain_id deployed on all x1 (master and slaves?)

The plan would be running:

./software/dbtools/osc_host.sh --host=db1031.codfw.wmnet --db=wikishared --table=cx_translations --replicate --gtid_domain_id=1 --method=ddl "DROP INDEX cx_translation_pair, ADD UNIQUE INDEX cx_translations (translation_source_title, translation_source_language, translation_target_language, translation_started_by)"

the --gtid_domain_id option does not exist, I am adding it now. Is gtid_domain_id deployed on all x1 (master and slaves?)

Ah nice!
Yes, it is deployed everywhere:

root@neodymium:/home/marostegui/git/software/dbtools# for i in `cat x1.hosts |awk -F " " '{print $1}'`; do echo $i; mysql --skip-ssl -h$i -e "show global variables like 'gtid_domain_id';";done
dbstore2001.codfw.wmnet
+----------------+-----------+
| Variable_name  | Value     |
+----------------+-----------+
| gtid_domain_id | 180355104 |
+----------------+-----------+
dbstore2002.codfw.wmnet
+----------------+----------+
| Variable_name  | Value    |
+----------------+----------+
| gtid_domain_id | 18036328 |
+----------------+----------+
db2033.codfw.wmnet
+----------------+-----------+
| Variable_name  | Value     |
+----------------+-----------+
| gtid_domain_id | 180363268 |
+----------------+-----------+
dbstore1001.eqiad.wmnet
+----------------+-----------+
| Variable_name  | Value     |
+----------------+-----------+
| gtid_domain_id | 171978769 |
+----------------+-----------+
dbstore1002.eqiad.wmnet
+----------------+-----------+
| Variable_name  | Value     |
+----------------+-----------+
| gtid_domain_id | 171978770 |
+----------------+-----------+
db1029.eqiad.wmnet
+----------------+-----------+
| Variable_name  | Value     |
+----------------+-----------+
| gtid_domain_id | 171970578 |
+----------------+-----------+
db1031.eqiad.wmnet
+----------------+-----------+
| Variable_name  | Value     |
+----------------+-----------+
| gtid_domain_id | 171970580 |
+----------------+-----------+

@Marostegui Please give it a second look:

$ ./software/dbtools/osc_host.sh --host=db1031.eqiad.wmnet --db=wikishared --table=cx_translations --no-replicate --gtid_domain_id=1 --method=ddl "DROP INDEX cx_translation_pair, ADD UNIQUE INDEX cx_translations (translation_source_title, translation_source_language, translation_target_language, translation_started_by)"
--------------
mysql  Ver 15.1 Distrib 10.1.21-MariaDB, for Linux (x86_64) using readline 5.2

Connection id:		1747747689
Current database:	
Current user:		root@10.64.32.20
SSL:			Not in use
Current pager:		stdout
Using outfile:		''
Using delimiter:	;
Server:			MariaDB
Server version:		10.0.22-MariaDB-log MariaDB Server
Protocol version:	10
Connection:		db1031.eqiad.wmnet via TCP/IP
Server characterset:	binary
Db     characterset:	binary
Client characterset:	utf8
Conn.  characterset:	utf8
TCP port:		3306
Uptime:			327 days 5 hours 2 min 50 sec

Threads: 23  Questions: 28273974864  Slow queries: 6266  Opens: 33958  Flush tables: 13  Open tables: 6002  Queries per second avg: 1000.106
--------------

Host        : db1031.eqiad.wmnet
Port        : 3306
Databases   : wikishared
Table       : cx_translations
Alter       : DROP INDEX cx_translation_pair, ADD UNIQUE INDEX cx_translations (translation_source_title, translation_source_language, translation_target_language, translation_started_by)
method      : ddl
pt dry args :  --recurse=0 --set-vars=sql_log_bin=off --check-slave-lag=db1031.eqiad.wmnet
pt args     :  --recurse=0 --set-vars=sql_log_bin=off --check-slave-lag=db1031.eqiad.wmnet
ddl args    : SET SESSION innodb_lock_wait_timeout=1; SET SESSION lock_wait_timeout=60;  set session sql_log_bin=0; set session gtid_domain_id = 1;
analyze     : 0
continue? yes/no no
abort

Some observartions:

  • this case we'd need to remove: --no-replicate as we do want it to replicate.
  • They named the UNIQUE as cx_translation_ref in the patch

So with those changes the command (after enabling parallel on the slaves) would be:

./software/dbtools/osc_host.sh --host=db1031.eqiad.wmnet --db=wikishared --table=cx_translations  --gtid_domain_id=1 --method=ddl "DROP INDEX cx_translation_pair, ADD UNIQUE INDEX cx_translation_ref (translation_source_title, translation_source_language, translation_target_language, translation_started_by)"

Change 342629 had a related patch set uploaded (by Jcrespo):
[operations/software] osc_host.sh: Add support for gtid-based online alter table

https://gerrit.wikimedia.org/r/342629

$ ./software/dbtools/osc_host.sh --host=db1031.eqiad.wmnet --db=wikishared --table=cx_translations --replicate --gtid_domain_id=1 --method=ddl "DROP INDEX cx_translation_pair, ADD UNIQUE INDEX cx_translation_ref (translation_source_title, translation_source_language, translation_target_language, translation_started_by)"
--------------
mysql  Ver 15.1 Distrib 10.1.21-MariaDB, for Linux (x86_64) using readline 5.2

Connection id:		1747824419
Current database:	
Current user:		root@10.64.32.20
SSL:			Not in use
Current pager:		stdout
Using outfile:		''
Using delimiter:	;
Server:			MariaDB
Server version:		10.0.22-MariaDB-log MariaDB Server
Protocol version:	10
Connection:		db1031.eqiad.wmnet via TCP/IP
Server characterset:	binary
Db     characterset:	binary
Client characterset:	utf8
Conn.  characterset:	utf8
TCP port:		3306
Uptime:			327 days 5 hours 16 min 50 sec

Threads: 18  Questions: 28274699291  Slow queries: 6266  Opens: 33958  Flush tables: 13  Open tables: 6002  Queries per second avg: 1000.102
--------------

Host        : db1031.eqiad.wmnet
Port        : 3306
Databases   : wikishared
Table       : cx_translations
Alter       : DROP INDEX cx_translation_pair, ADD UNIQUE INDEX cx_translation_ref (translation_source_title, translation_source_language, translation_target_language, translation_started_by)
method      : ddl
pt dry args :  --recurse=1 --chunk-size-limit=10 --check-slave-lag=db1031.eqiad.wmnet
pt args     :  --recurse=1 --chunk-size-limit=10 --check-slave-lag=db1031.eqiad.wmnet
ddl args    : SET SESSION innodb_lock_wait_timeout=1; SET SESSION lock_wait_timeout=60;  set session sql_log_bin=1; set session gtid_domain_id = 1;
analyze     : 0
continue? yes/no no
abort

It requires the patch I sent above. And the parallel threads, I assume.

$ ./software/dbtools/osc_host.sh --host=db1031.eqiad.wmnet --db=wikishared --table=cx_translations --replicate --gtid_domain_id=1 --method=ddl "DROP INDEX cx_translation_pair, ADD UNIQUE INDEX cx_translation_ref (translation_source_title, translation_source_language, translation_target_language, translation_started_by)"

Looks good!

It requires the patch I sent above. And the parallel threads, I assume.

Yes, I was checking it.
I will enable the parallel threads before doing the alter.

How much do we trust slave_parallel_threads, BTW? I would leave this to you, if you plan to do it tomorrow. I will go back to db1057.

How much do we trust slave_parallel_threads, BTW? I would leave this to you, if you plan to do it tomorrow. I will go back to db1057.

That is the million dollar question!
I have left them enabled on db2033:

| 9832823 | system user     |                   | NULL               | Connect     |       11 | Waiting for work from SQL thread                                            | NULL             |    0.000 |
| 9832824 | system user     |                   | NULL               | Connect     |        1 | Waiting for work from SQL thread                                            | NULL             |    0.000 |
| 9832825 | system user     |                   | NULL               | Connect     |        0 | Waiting for work from SQL thread                                            | NULL             |    0.000 |
| 9832826 | system user     |                   | NULL               | Connect     |       11 | Waiting for work from SQL thread                                            | NULL             |    0.000 |
| 9832827 | system user     |                   | NULL               | Connect     |       11 | Waiting for work from SQL thread

We will see how it behaves on the multisource slaves involved here.
Once it is deployed I will probably set them back to 0

Mentioned in SAL (#wikimedia-operations) [2017-03-14T14:36:24Z] <marostegui> Enabled parallel replication (5 threads) on db2033 (x1) - T160407

Change 342629 merged by jenkins-bot:
[operations/software] osc_host.sh: Add support for gtid-based online alter table

https://gerrit.wikimedia.org/r/342629

Mentioned in SAL (#wikimedia-operations) [2017-03-15T07:26:56Z] <marostegui> Enable parallel replication on x1 slaves - T160407

Just to be on the safe side, I have taken a backup of cx_translations and it is at: db1031:/srv/tmp/cx_translations.sql

Just to be on the safe side, I have taken a backup of cx_translations and it is at: db1031:/srv/tmp/cx_translations.sql

Thanks!

Mentioned in SAL (#wikimedia-operations) [2017-03-15T08:22:25Z] <marostegui> Deploy alter table x1 testing parallel replication - T160407

This has been applied in all the core servers, it was really fast, so no delay was appreciable (even if there was some).
The parallel alter has been done correctly, which is good news!

dbstore1001 and 2001 are delayed 24, so it will get it applied tomorrow I guess. dbstore2002 doesn't have x1.
db2033, db1029, db1031 (the master) are the servers we need to look at now and they look good:

db1029.eqiad.wmnet
*************************** 1. row ***************************
       Table: cx_translations
Create Table: CREATE TABLE `cx_translations` (
  `translation_id` int(11) NOT NULL AUTO_INCREMENT,
  `translation_source_title` varbinary(512) NOT NULL,
  `translation_target_title` varbinary(512) NOT NULL,
  `translation_source_language` varbinary(36) NOT NULL,
  `translation_target_language` varbinary(36) NOT NULL,
  `translation_source_url` blob NOT NULL,
  `translation_target_url` blob,
  `translation_status` enum('draft','published','deleted') DEFAULT NULL,
  `translation_start_timestamp` varbinary(14) NOT NULL,
  `translation_last_updated_timestamp` varbinary(14) NOT NULL,
  `translation_progress` tinyblob NOT NULL,
  `translation_started_by` int(11) DEFAULT NULL,
  `translation_last_update_by` int(11) DEFAULT NULL,
  `translation_source_revision_id` int(10) unsigned DEFAULT NULL,
  `translation_target_revision_id` int(10) unsigned DEFAULT NULL,
  PRIMARY KEY (`translation_id`),
  UNIQUE KEY `cx_translation_ref` (`translation_source_title`,`translation_source_language`,`translation_target_language`,`translation_started_by`),
  KEY `cx_translation_languages` (`translation_source_language`,`translation_target_language`)
) ENGINE=InnoDB AUTO_INCREMENT=290107 DEFAULT CHARSET=binary


db2033.codfw.wmnet
*************************** 1. row ***************************
       Table: cx_translations
Create Table: CREATE TABLE `cx_translations` (
  `translation_id` int(11) NOT NULL AUTO_INCREMENT,
  `translation_source_title` varbinary(512) NOT NULL,
  `translation_target_title` varbinary(512) NOT NULL,
  `translation_source_language` varbinary(36) NOT NULL,
  `translation_target_language` varbinary(36) NOT NULL,
  `translation_source_url` blob NOT NULL,
  `translation_target_url` blob,
  `translation_status` enum('draft','published','deleted') DEFAULT NULL,
  `translation_start_timestamp` varbinary(14) NOT NULL,
  `translation_last_updated_timestamp` varbinary(14) NOT NULL,
  `translation_progress` tinyblob NOT NULL,
  `translation_started_by` int(11) DEFAULT NULL,
  `translation_last_update_by` int(11) DEFAULT NULL,
  `translation_source_revision_id` int(10) unsigned DEFAULT NULL,
  `translation_target_revision_id` int(10) unsigned DEFAULT NULL,
  PRIMARY KEY (`translation_id`),
  UNIQUE KEY `cx_translation_ref` (`translation_source_title`,`translation_source_language`,`translation_target_language`,`translation_started_by`),
  KEY `cx_translation_languages` (`translation_source_language`,`translation_target_language`)
) ENGINE=InnoDB AUTO_INCREMENT=290107 DEFAULT CHARSET=binary


db1031.eqiad.wmnet
*************************** 1. row ***************************
       Table: cx_translations
Create Table: CREATE TABLE `cx_translations` (
  `translation_id` int(11) NOT NULL AUTO_INCREMENT,
  `translation_source_title` varbinary(512) NOT NULL,
  `translation_target_title` varbinary(512) NOT NULL,
  `translation_source_language` varbinary(36) NOT NULL,
  `translation_target_language` varbinary(36) NOT NULL,
  `translation_source_url` blob NOT NULL,
  `translation_target_url` blob,
  `translation_status` enum('draft','published','deleted') DEFAULT NULL,
  `translation_start_timestamp` varbinary(14) NOT NULL,
  `translation_last_updated_timestamp` varbinary(14) NOT NULL,
  `translation_progress` tinyblob NOT NULL,
  `translation_started_by` int(11) DEFAULT NULL,
  `translation_last_update_by` int(11) DEFAULT NULL,
  `translation_source_revision_id` int(10) unsigned DEFAULT NULL,
  `translation_target_revision_id` int(10) unsigned DEFAULT NULL,
  PRIMARY KEY (`translation_id`),
  UNIQUE KEY `cx_translation_ref` (`translation_source_title`,`translation_source_language`,`translation_target_language`,`translation_started_by`),
  KEY `cx_translation_languages` (`translation_source_language`,`translation_target_language`)
) ENGINE=InnoDB AUTO_INCREMENT=290107 DEFAULT CHARSET=binary

dbstore1002 is a bit delayed due to load, and hasn't applied the change yet - I will monitor it to make sure it works:

dbstore1002.eqiad.wmnet
*************************** 1. row ***************************
       Table: cx_translations
Create Table: CREATE TABLE `cx_translations` (
  `translation_id` int(11) NOT NULL AUTO_INCREMENT,
  `translation_source_title` varbinary(512) NOT NULL,
  `translation_target_title` varbinary(512) NOT NULL,
  `translation_source_language` varbinary(36) NOT NULL,
  `translation_target_language` varbinary(36) NOT NULL,
  `translation_source_url` blob NOT NULL,
  `translation_target_url` blob,
  `translation_status` enum('draft','published','deleted') DEFAULT NULL,
  `translation_start_timestamp` varbinary(14) NOT NULL,
  `translation_last_updated_timestamp` varbinary(14) NOT NULL,
  `translation_progress` tinyblob NOT NULL,
  `translation_started_by` int(11) DEFAULT NULL,
  `translation_last_update_by` int(11) DEFAULT NULL,
  `translation_source_revision_id` int(10) unsigned DEFAULT NULL,
  `translation_target_revision_id` int(10) unsigned DEFAULT NULL,
  PRIMARY KEY (`translation_id`),
  UNIQUE KEY `cx_translation_pair` (`translation_source_title`,`translation_source_language`,`translation_target_language`),
  KEY `cx_translation_languages` (`translation_source_language`,`translation_target_language`)
) ENGINE=InnoDB AUTO_INCREMENT=290103 DEFAULT CHARSET=binary

dbstore1002 is misbehaving, not sure if it is because of parallel replication or because of the big query it is trying to run:

| 117770847 | research        | x | log                 | Query   | 4307 | Queried about 1667770000 rows                                               | SELECT CONCAT(SUBSTRING(timestamp, 1, 4), '-', SUBSTRING(timestamp, 5, 2), '-', SUBSTRING(timestamp, |    0.000 |

| 117770847 | research         | log                | Query   | 1137 | Queried about 298220000 rows                                                | SELECT CONCAT(SUBSTRING(timestamp, 1, 4), '-', SUBSTRING(timestamp, 5, 2), '-', SUBSTRING(timestamp, |    0.000 |

It might have been caused by parallel replication, I was able to disable it and the server started to behave better.
I will test it again as it is "easy" to reproduce.

Anyways, the change went thru - I will keep an eye on dbstore1001 and dbstore2001 as it will be done by tomorrow. Will close this ticket for now and reopen if there is any issue with those two hosts:

CREATE TABLE `cx_translations` (
  `translation_id` int(11) NOT NULL AUTO_INCREMENT,
  `translation_source_title` varbinary(512) NOT NULL,
  `translation_target_title` varbinary(512) NOT NULL,
  `translation_source_language` varbinary(36) NOT NULL,
  `translation_target_language` varbinary(36) NOT NULL,
  `translation_source_url` blob NOT NULL,
  `translation_target_url` blob,
  `translation_status` enum('draft','published','deleted') DEFAULT NULL,
  `translation_start_timestamp` varbinary(14) NOT NULL,
  `translation_last_updated_timestamp` varbinary(14) NOT NULL,
  `translation_progress` tinyblob NOT NULL,
  `translation_started_by` int(11) DEFAULT NULL,
  `translation_last_update_by` int(11) DEFAULT NULL,
  `translation_source_revision_id` int(10) unsigned DEFAULT NULL,
  `translation_target_revision_id` int(10) unsigned DEFAULT NULL,
  PRIMARY KEY (`translation_id`),
  UNIQUE KEY `cx_translation_ref` (`translation_source_title`,`translation_source_language`,`translation_target_language`,`translation_started_by`),
  KEY `cx_translation_languages` (`translation_source_language`,`translation_target_language`)
) ENGINE=InnoDB AUTO_INCREMENT=290123 DEFAULT CHARSET=binary

Mentioned in SAL (#wikimedia-operations) [2017-03-15T09:02:15Z] <marostegui> Disable parallel replication on x1 slaves (db1029, db2033) - T160407

It is definitely multisource + parallel replication.
All the multisource slaves are/were suffering this

Mentioned in SAL (#wikimedia-operations) [2017-03-15T09:11:11Z] <marostegui> Disable parallel replication on dbstore2002, dbstore2001, dbstore1002, dbstore1001 - T160407

dbstore1001 got the change:

CREATE TABLE `cx_translations` (
  `translation_id` int(11) NOT NULL AUTO_INCREMENT,
  `translation_source_title` varbinary(512) NOT NULL,
  `translation_target_title` varbinary(512) NOT NULL,
  `translation_source_language` varbinary(36) NOT NULL,
  `translation_target_language` varbinary(36) NOT NULL,
  `translation_source_url` blob NOT NULL,
  `translation_target_url` blob,
  `translation_status` enum('draft','published','deleted') DEFAULT NULL,
  `translation_start_timestamp` varbinary(14) NOT NULL,
  `translation_last_updated_timestamp` varbinary(14) NOT NULL,
  `translation_progress` tinyblob NOT NULL,
  `translation_started_by` int(11) DEFAULT NULL,
  `translation_last_update_by` int(11) DEFAULT NULL,
  `translation_source_revision_id` int(10) unsigned DEFAULT NULL,
  `translation_target_revision_id` int(10) unsigned DEFAULT NULL,
  PRIMARY KEY (`translation_id`),
  UNIQUE KEY `cx_translation_ref` (`translation_source_title`,`translation_source_language`,`translation_target_language`,`translation_started_by`),
  KEY `cx_translation_languages` (`translation_source_language`,`translation_target_language`)
) ENGINE=InnoDB AUTO_INCREMENT=290533 DEFAULT CHARSET=binary

And finally the last host, dbstore2001 got the change too:

root@DBSTORE[wikishared]> show create table cx_translations\G
*************************** 1. row ***************************
       Table: cx_translations
Create Table: CREATE TABLE `cx_translations` (
  `translation_id` int(11) NOT NULL AUTO_INCREMENT,
  `translation_source_title` varbinary(512) NOT NULL,
  `translation_target_title` varbinary(512) NOT NULL,
  `translation_source_language` varbinary(36) NOT NULL,
  `translation_target_language` varbinary(36) NOT NULL,
  `translation_source_url` blob NOT NULL,
  `translation_target_url` blob,
  `translation_status` enum('draft','published','deleted') DEFAULT NULL,
  `translation_start_timestamp` varbinary(14) NOT NULL,
  `translation_last_updated_timestamp` varbinary(14) NOT NULL,
  `translation_progress` tinyblob NOT NULL,
  `translation_started_by` int(11) DEFAULT NULL,
  `translation_last_update_by` int(11) DEFAULT NULL,
  `translation_source_revision_id` int(10) unsigned DEFAULT NULL,
  `translation_target_revision_id` int(10) unsigned DEFAULT NULL,
  PRIMARY KEY (`translation_id`),
  UNIQUE KEY `cx_translation_ref` (`translation_source_title`,`translation_source_language`,`translation_target_language`,`translation_started_by`),
  KEY `cx_translation_languages` (`translation_source_language`,`translation_target_language`)
) ENGINE=InnoDB AUTO_INCREMENT=290107 DEFAULT CHARSET=binary