Now that we are getting closer to have a way to automatic depool/repool slaves, I would like to start a conversation about the future of this task.
Once we have active-active we can no longer deploy schema changes on the passive DC with replication, and instead we have to go host by host no matter what.
However, we do have to do it with replication for the sanitarium masters, specially for those changes that are likely to break replication with ROW based if not done in the correct other (adding/dropping columns, changing PKs etc).
The fact that we are starting to work towards having a database with all the hosts metadata (the new tables that live in tendril DB) makes things a bit easier for this in the future, as we can deploy the changes on a per server basis with a SQL query to select them.
However, by looking at the tables, we might need to tweak them a bit, as right now there is no way to identify what is a sanitarium master, and that'd be the only case in which the script shouldn't use set session sql_log_bin=0.
Also, ideally we should be able to use parameters as (examples of things I can think of):
--depool: if we want the slave to be depooled for the alter table (ie: a change that will generate lag, a change that will likely run into metadata locking issues). If --depool is used we should probably include a downtime by default. The slave should be automatically repooled once the alter is done and there is no lag.
This isn't trivial as if we are depooling an vslow host for instance, we'd need to repool another host in the meanwhile.
--no-master: if we don't want the masters to be altered, because it is an alter that will lock the table for some reason (ie: changing column data type).
--from-master: If there is a schema change that can be done directly on the primary master with replication (ie: such a tiny table or even empty that it can be done directly on the master and let it replicate). We don't have many of these, but worth considering.