I currently have a set of shell scripts that i use for doing schema changes. The functionality is roughly:
- some common functions:
- check: given a db instance and a database name, inspects the table schema(s) to see if the given change has been made or not. This is specific to the schema change being made.
- dblist: for a given section, return a list of all databases. This queries noc.wikimedia.org/conf/dblists.
- num_slaves: count the number of lines in show slave hosts on the given db instance
- a check script: given a section and a host it checks if the schema change has been made to all databases in that instance.
- a run-norep script: given a section and a host it iterates over all databases in that instance, skipping those that are already done, and makes the schema change with replication disabled. By default it will abort if the db instance has replicas.
- a run-repreprep script: very similar to run-norep, with the following differences: it runs the schema change with replication enabled, and will abort if the db instance has no replicas.
There are 3 parts of this that have to be customized for each schema change: the check() function, and the schema change commands in both run-norep and run-repreprep.
This should all be replaced by a single python script that takes a config file that describes the schema change, and how to check for it. This would greatly simplify the work in running schema changes in production.
Future versions could expand functionality:
- Checking if host is in active DC, and in dbctl, and if so depooling it
- Automatically inserting icinga downtimes for the host, and replicas if it has any
but these features are definitely not needed for the first version, which would already be really useful.