Page MenuHomePhabricator

Basic automation of schema changes
Closed, ResolvedPublic

Description

I currently have a set of shell scripts that i use for doing schema changes. The functionality is roughly:

  • some common functions:
    • check: given a db instance and a database name, inspects the table schema(s) to see if the given change has been made or not. This is specific to the schema change being made.
    • dblist: for a given section, return a list of all databases. This queries noc.wikimedia.org/conf/dblists.
    • num_slaves: count the number of lines in show slave hosts on the given db instance
  • a check script: given a section and a host it checks if the schema change has been made to all databases in that instance.
  • a run-norep script: given a section and a host it iterates over all databases in that instance, skipping those that are already done, and makes the schema change with replication disabled. By default it will abort if the db instance has replicas.
  • a run-repreprep script: very similar to run-norep, with the following differences: it runs the schema change with replication enabled, and will abort if the db instance has no replicas.

There are 3 parts of this that have to be customized for each schema change: the check() function, and the schema change commands in both run-norep and run-repreprep.

This should all be replaced by a single python script that takes a config file that describes the schema change, and how to check for it. This would greatly simplify the work in running schema changes in production.

Future versions could expand functionality:

  • Checking if host is in active DC, and in dbctl, and if so depooling it
  • Automatically inserting icinga downtimes for the host, and replicas if it has any

but these features are definitely not needed for the first version, which would already be really useful.

Event Timeline

Kormat triaged this task as Medium priority.Aug 5 2021, 1:01 PM
Kormat added a project: User-Kormat.
Kormat moved this task from Unsorted 💣 to Back Burner 🏛️ on the User-Kormat board.

Change 743141 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[operations/software@master] auto_schema: Avoid reusing sql variable in replica for loop

https://gerrit.wikimedia.org/r/743141

Change 743141 merged by jenkins-bot:

[operations/software@master] auto_schema: Avoid reusing sql variable in replica for loop

https://gerrit.wikimedia.org/r/743141

Change 743347 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/software@master] host.py: Slightly change a message

https://gerrit.wikimedia.org/r/743347

Change 743347 merged by jenkins-bot:

[operations/software@master] host.py: Slightly change a message

https://gerrit.wikimedia.org/r/743347

Change 744042 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[operations/software@master] [WIP] Re-architecture auto_schema

https://gerrit.wikimedia.org/r/744042

Change 744778 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[operations/software@master] auto_schema: Stop adding ticket to downtime cookbook

https://gerrit.wikimedia.org/r/744778

Change 744778 merged by jenkins-bot:

[operations/software@master] auto_schema: Stop adding ticket to downtime cookbook

https://gerrit.wikimedia.org/r/744778

Change 744850 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[operations/software@master] auto_schema: Add logging on file

https://gerrit.wikimedia.org/r/744850

Ladsgroup moved this task from Ready to In progress on the DBA board.

I'm doing this, don't I?

Change 745837 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[operations/software@master] auto_schema: Add README pointing to wikitech page

https://gerrit.wikimedia.org/r/745837

Change 745837 merged by jenkins-bot:

[operations/software@master] auto_schema: Add README pointing to wikitech page

https://gerrit.wikimedia.org/r/745837

Change 744850 merged by jenkins-bot:

[operations/software@master] auto_schema: Add logging on file

https://gerrit.wikimedia.org/r/744850

Change 746791 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/software@master] replica_set,schema_change: Minor language changes

https://gerrit.wikimedia.org/r/746791

Change 746791 merged by jenkins-bot:

[operations/software@master] replica_set,schema_change: Minor language changes

https://gerrit.wikimedia.org/r/746791

Change 744042 abandoned by Ladsgroup:

[operations/software@master] [WIP] Re-architecture auto_schema

Reason:

It was more of a POC which is done now.

https://gerrit.wikimedia.org/r/744042

To update a bit the on-going work as we have drifted a bit from the original description of the task!.
At the moment we are using https://github.com/wikimedia/operations-software/tree/master/dbtools/auto_schema which seems to be running fine and its been used for T277354, running it on two entire sections with no issues (primary masters not included).
It is documented at https://wikitech.wikimedia.org/wiki/Auto_schema

Change 747624 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[operations/software@master] auto_schema: Move away from mysql.py

https://gerrit.wikimedia.org/r/747624

Change 747627 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[operations/software@master] auto_schema: Add a timeout for depooling + downtime replicas for longer

https://gerrit.wikimedia.org/r/747627

Change 747624 merged by jenkins-bot:

[operations/software@master] auto_schema: Move away from mysql.py

https://gerrit.wikimedia.org/r/747624

Change 747627 merged by jenkins-bot:

[operations/software@master] auto_schema: Add a timeout for depooling + downtime replicas for longer

https://gerrit.wikimedia.org/r/747627

Change 748723 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[operations/software@master] auto_schema: Refactor bash to make it a bit cleaner

https://gerrit.wikimedia.org/r/748723

Change 748726 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[operations/software@master] auto_schema: Automatic detection of active dc

https://gerrit.wikimedia.org/r/748726

Change 748723 merged by jenkins-bot:

[operations/software@master] auto_schema: Refactor bash to make it a bit cleaner

https://gerrit.wikimedia.org/r/748723

Change 748726 merged by jenkins-bot:

[operations/software@master] auto_schema: Automatic detection of active dc

https://gerrit.wikimedia.org/r/748726

Change 752656 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[operations/software@master] auto_schema: Make sure the port :3306 is removed during check for depool

https://gerrit.wikimedia.org/r/752656

Change 752656 merged by jenkins-bot:

[operations/software@master] auto_schema: Make sure the port :3306 is removed during check for depool

https://gerrit.wikimedia.org/r/752656

Change 752662 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[operations/software@master] auto_schema: Remove the probelmatic override in depool logic

https://gerrit.wikimedia.org/r/752662

Change 752662 merged by jenkins-bot:

[operations/software@master] auto_schema: Remove the probelmatic override in depool logic

https://gerrit.wikimedia.org/r/752662

Change 752667 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[operations/software/schema-changes@master] Add License and README

https://gerrit.wikimedia.org/r/752667

Change 752667 merged by Ladsgroup:

[operations/software/schema-changes@master] Add License and README

https://gerrit.wikimedia.org/r/752667

Change 752671 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[operations/software@master] auto_schema: Remove default port in the Host

https://gerrit.wikimedia.org/r/752671

Change 752671 merged by jenkins-bot:

[operations/software@master] auto_schema: Remove default port in the Host

https://gerrit.wikimedia.org/r/752671

Ladsgroup moved this task from In progress to Done on the DBA board.

Now we have auto_schema and it has been used quite often for schema changes (one being running as I write this). More info has been documented in https://wikitech.wikimedia.org/wiki/Auto_schema and there is a document on what would be the next steps and improvements on it https://docs.google.com/document/d/1qVinCcu4T5NkJneFKwY70giCDmP1kBhGee385LWdiwA/edit and next week we are giving a presentation about it. I think this can be closed now and we will be creating tickets for each needed improvement.