Implement a system to automatically deploy schema changes without needing DBA intervention
Open, NormalPublic

Description

Probably this will require an automatic pooling/depooling system, depending on rearchitecturing (T119626).

jcrespo created this task.Dec 18 2015, 11:31 AM
jcrespo updated the task description. (Show Details)
jcrespo raised the priority of this task from to Needs Triage.
jcrespo added a subscriber: jcrespo.
Restricted Application added subscribers: StudiesWorld, Aklapper. · View Herald TranscriptDec 18 2015, 11:31 AM
mark added a subscriber: mark.Dec 28 2015, 12:45 PM
hoo added a subscriber: hoo.Jan 20 2016, 2:43 PM
jcrespo moved this task from Triage to Meta/Epic on the DBA board.Nov 10 2016, 12:05 PM
Marostegui triaged this task as Normal priority.Aug 2 2018, 6:48 AM
Marostegui added a subscriber: Marostegui.

Now that we are getting closer to have a way to automatic depool/repool slaves, I would like to start a conversation about the future of this task.

Once we have active-active we can no longer deploy schema changes on the passive DC with replication, and instead we have to go host by host no matter what.
However, we do have to do it with replication for the sanitarium masters, specially for those changes that are likely to break replication with ROW based if not done in the correct other (adding/dropping columns, changing PKs etc).

The fact that we are starting to work towards having a database with all the hosts metadata (the new tables that live in tendril DB) makes things a bit easier for this in the future, as we can deploy the changes on a per server basis with a SQL query to select them.
However, by looking at the tables, we might need to tweak them a bit, as right now there is no way to identify what is a sanitarium master, and that'd be the only case in which the script shouldn't use set session sql_log_bin=0.

Also, ideally we should be able to use parameters as (examples of things I can think of):

--depool: if we want the slave to be depooled for the alter table (ie: a change that will generate lag, a change that will likely run into metadata locking issues). If --depool is used we should probably include a downtime by default. The slave should be automatically repooled once the alter is done and there is no lag.
This isn't trivial as if we are depooling an vslow host for instance, we'd need to repool another host in the meanwhile.

--no-master: if we don't want the masters to be altered, because it is an alter that will lock the table for some reason (ie: changing column data type).

--from-master: If there is a schema change that can be done directly on the primary master with replication (ie: such a tiny table or even empty that it can be done directly on the master and let it replicate). We don't have many of these, but worth considering.