Page MenuHomePhabricator

Management of Cassandra schema and keyspace/table configuration
Open, NormalPublic

Description

Schema v. Configuration

Cassandra's DDL covers what is traditionally considered schema, but also information that is more configuration in nature. For example, consider keyspace creation:

keyspace
CREATE KEYSPACE "globaldomain_T_mathoid__ng_mml" WITH replication = {'class': 'NetworkTopologyStrategy', 'codfw': '3', 'eqiad': '3'}  AND durable_writes = true;
-- \___________________________________________/      \________________________________________________________________________________________________________/
--                    |                                                                                   |
--                  schema                                                                          configuration

A keyspace in Cassandra is a namespace to associate tables with (similar to a database in MySQL terminology). Here, globaldomain_T_mathoid__ng_mml is the keyspace, and everything that follows the WITH is configuration pertaining to associated tables (replication, or whether or not to make use of the commitlog).

It is similar with tables:

table
-- Schema ~~~~~~~~~~~~~~~
CREATE TABLE "globaldomain_T_mathoid__ng_mml".data (
    "_domain" text,
    key text,
    headers text,
    tid timeuuid,
    value text,
    PRIMARY KEY (("_domain", key))
-- Configuration ~~~~~~~~
) WITH bloom_filter_fp_chance = 0.01
    AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
    AND comment = ''
    AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
    AND compression = {'chunk_length_in_kb': '32', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND crc_check_chance = 1.0
    AND dclocal_read_repair_chance = 0.1
    AND default_time_to_live = 0
    AND gc_grace_seconds = 86400
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair_chance = 0.0
    AND speculative_retry = '99PERCENTILE';

Likewise, the DDL describes both schema, and table-specific configuration. In the example above, "globaldomain_T_mathoid__ng_mml".data is the table name, followed within the parenthesis by the names and types of the attributes. This is schema, as it models the data to be stored there. Everything that follows the WITH however, is configuration.

This is an important distinction (schema v configuration), because schema is determined by the application; No change in schema makes sense without a corresponding change to the application. Configuration however is site-specific, and operational in nature; Parameters can be unique to a use-case, and updated frequently outside of any change to the application. Schema is determined by application developers, configuration by users/operators.

Unfortunately, it is colloquial to refer to the entire DDL for a keyspace and/or table as schema. However, every effort is made throughput this ticket to use the term schema to mean only that which determines the data model, and configuration to refer to the corresponding operational settings.

Proposal

Schema Management

Since schema is tightly coupled to the application, it makes sense that it be kept versioned with the application code, where it can be changed in lock-step.

Ideally, we'd omit all configuration data, and rely on post-creation ALTERs to update defaults, but CREATE KEYPACE requires us to supply replication parameters. In this case all we can do is provide the minimum information (and document for third-parties the expectation that they followup).

schema.cql
-- Use ALTER after keyspace creation to update replication according to your site requirements.
CREATE KEYSPACE data
    WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 1};

-- NOTE: Use ALTER after table creation to update default table properties, if necessary.
CREATE TABLE data.values (
    username  text PRIMARY KEY, 
    given     text,
    surname   text
);

A complete schema is only useful for initial setup; It may also be necessary to ship the DDL needed to upgrade schema between releases.

1.0-to-1.1.cql
ALTER TABLE data.values ADD email text;

Configuration Management

A YAML-formatted file will be used to define the configuration properties for a cluster. The file will be maintained in version control (Puppet?), and synchronized to each of the corresponding cluster nodes.

table_properties.yaml
keyspaces:
  - name: keyspace1
    replication:
      class: NetworkTopologyStrategy
      codfw: 2
      eqiad: 2
    durable_writes: true
    tables:
        - name: data
          compaction:
            class: SizeTieredCompactionStrategy
            min_threshold: 4
            max_threshold: 32
          speculative_retry: 99PERCENTILE
          gc_grace_seconds: 864000
        - name: data1
          compaction:
            class: SizeTieredCompactionStrategy
            min_threshold: 4
            max_threshold: 32
          speculative_retry: 99PERCENTILE
          gc_grace_seconds: 864000

A utility installed on the Cassandra nodes will accept the configuration file as an argument, connect to the corresponding cluster, introspect the current configuration, and output (stdout) the necessary ALTER statements to match the cluster to the configuration.

A database operator will be able to log into any node, review the output, and when ready, apply it using cqlsh (i.e. table_properties /etc/cassandra-a/table_properties.yaml | cqlsh).

Some additional requirements of this utility:

  • Should be packaged for Debian, and installed as part of the Cassandra role (Puppet)
  • Must have a mode that outputs ALTER statements for the entire configuration, regardless of current cluster state
  • Must output at most one ALTER per table, even if multiple properties are to be set (ala AND)
  • Design should not preclude being expanded to manage schema as well as configuration at a later date

Proof-of-concept

A working prototype for the above can be found on Github.

To get started:

$ pip install -r requirements.txt
$ ./table-properties --help
usage: table-properties [-h] [-c <ip>] [-C <filename>] [-d] [-k <filename>] [-l <filename>] [-p <port #>] [-P]
                        [-r <filename>] [-t] [-u <user name>] [-v]
                        [<filename>]

Compare actual Cassandra keyspace and table properties to desired properties defined in a YAML file and create ALTER KEYSPACE and ALTER TABLE statements for properties that differ.

positional arguments:
  <filename>            Desired configuration YAML file

optional arguments:
  -h, --help            show this help message and exit
  -i <ip>, --ip <ip>    Host IP address or name. Default: localhost
  -C <filename>, --clientcert <filename>
                        Client cert file name.
  -d, --dump            Dump current configuration to STDOUT
  -k <filename>, --clientkey <filename>
                        Client key file name.
  -l <filename>, --log <filename>
                        Log file name. If none is provied, STDERR is used.
  -p <port #>, --port <port #>
                        Port number. Default: 9042
  -P, --password        Prompt for password.
  -q, --quiet           When the flag is set exit with 0 only if the configuration matches the YAML file. Exit with 1 otherwise.
  -r <filename>, --rcfile <filename>
                        cqlrc file name. Default: ~/.cassandra/cqlshrc
  -s, --ssl             Use SSL/TLS encryption for client server communication.
  -u <user name>, --username <user name>
                        User name for plain text authentication.
  -v, --version         show program's version number and exit

The current configuration can be dumped to create the initial config. In production, this would be checked into version control as part of operations/puppet, with each unique config corresponding to a Cassandra cluster (role-based?).

$ table-properties -d > tables.yaml

Updates committed to configs in operations/puppet would be copied into the instance configuration directories of the Cassandra nodes. An operator would then log into one node, and generate the corresponding ALTER statement output.

$ table-properties /etc/cassandra-a/sessionstore_table_properties.yaml
USE "kask";
ALTER TABLE values
WITH caching = {'keys': 'ALL', 'rows_per_partition': 'ALL'};

Assuming that output is valid, it could be applied using cqlshrc.

$ table-properties /etc/cassandra-a/sessionstore_table_properties.yaml | cqlsh

Event Timeline

Eevans created this task.Apr 5 2019, 10:04 PM
Restricted Application removed a project: Patch-For-Review. · View Herald TranscriptApr 5 2019, 10:04 PM
Eevans updated the task description. (Show Details)Apr 15 2019, 7:14 PM
Eevans updated the task description. (Show Details)Apr 15 2019, 7:34 PM
Eevans updated the task description. (Show Details)Apr 15 2019, 9:34 PM
Eevans updated the task description. (Show Details)Apr 16 2019, 4:25 PM
Eevans updated the task description. (Show Details)Apr 16 2019, 4:36 PM
Eevans triaged this task as Normal priority.Apr 16 2019, 4:39 PM
Eevans updated the task description. (Show Details)
Joe added a subscriber: Joe.Apr 19 2019, 3:14 PM

Let's try to breakdown the procedures of:

  1. Upgrading an existing table for a running application
  2. Creating a new table for a new application
  3. Altering the configuration of a table for a running application

Let's also assume that the application runs on kubernetes, and that $kubernetes_deployment_tool (now scap-helm, soon to be refined/replaced) is used to deploy that application.

Upgrade the schema of an existing table

In this scenario, our service "foo" is already accessing table "foo.meta" and we're currently adding a column to the table. The process of altering the schema and upgrading the application need to be able to be performed asynchronously. because neither schema changes nor application deployments are atomic. Let's assume we do what MediaWiki does, that is make the code deployable with or without the new schema, and protecting the use of the new features behind a feature flag. In this scenario:

  1. Code (and the proposed alter) are written, deployed to production
  2. This is just a schema change, so the cql command for the schema change can be issued whenever we want - even as a final step of deployment - directly by the deployer.
  3. The feature flag gets flipped in a subsequent deployment when it's assured the schema change has happened everywhere.

While this workflow seems simple and straightforward, there is obviously the problem that we have no way to track if the schema change was applied other than a !log action to SAL. In theory, we could allow the deployment system to apply any schema changes automatically after the deployment is done, but at this point I don't love the idea.

Creation of a new table

In this scenario, a new application is being installed. Since it's still not receiving traffic, all actions can have disruptive consequences with no harm.
The workflow would look like follows:

  1. Code gets deployed to production for the first time.
  2. The deployer applies the schema change, creating the table with default parameters.
  3. The corresponding configuration cql, generated by our configuration management system, gets applied as well.

Thus for the deployment of a new application we need one application deployment, one table creation action, one puppet change, and another manual action. I don't really like those many manual steps for something that could break freely, we can probably think of a better workflow.

Altering the configuration of a table

In this final scenario, we're altering just the configuration of a table, this is disjointed from any code deployment and could be due to performance or operativity reasons.

  1. Merge a puppet change with the CQL
  2. Apply the change in production.

All in all, with a couple caveats on finding a better process, I think this proposal can work. I'm just not sure about the strawman puppet proposal. I'd create a define called something like cassandra::table::config and just declare all instances directly in puppet a puppet class cassandra::tables_config that collects them all, instead of relying on hiera and something like create_resources.

I was also wondering if @Marostegui and @jcrespo could share insights from how we manage schema changes and configuration changes on mysql.

What's the best way to keep track of configuration changes that you apply at runtime?
How do you keep track of what schema is applied where? (note this is exponentially simpler in cassandra as you don't need to apply the schema changes on all servers)

I was also wondering if @Marostegui and @jcrespo could share insights from how we manage schema changes and configuration changes on mysql.
What's the best way to keep track of configuration changes that you apply at runtime?

Normally when we do a configuration change (as in: my.cnf) we manually let the hosts to pick up the change when we restart them for upgrades, maintenance etc.
If it is a flag that can be changed on the fly (and a restart isn't needed), we sometimes (depending on the urgency) apply that across the fleet manually (set global...). We do not let puppet to apply changes, we let puppet to change the in-file config, but not the live one.
Some flags cannot be modify on the fly, so those require manual restarts, which, we, again, start picking up when we do maintenance.

How do you keep track of what schema is applied where? (note this is exponentially simpler in cassandra as you don't need to apply the schema changes on all servers)

Nowadays, manually on phab. In our roadmap, we want to to evolve Tendril so we can track those there.
We have big plans for Tendril, to expand it to be a source of truth for many things, including schema changes, table definitions and even to help with automatic schema change deployments. But so far there has been no resources for that, we have been doing it slowly for other things (like for backups, stats...)

Eevans moved this task from Backlog to In-Progress on the User-Eevans board.Apr 25 2019, 3:48 PM

In this scenario, our service "foo" is already accessing table "foo.meta" and we're currently adding a column to the table. The process of altering the schema and upgrading the application need to be able to be performed asynchronously. because neither schema changes nor application deployments are atomic. Let's assume we do what MediaWiki does, that is make the code deployable with or without the new schema, and protecting the use of the new features behind a feature flag. In this scenario:

  1. Code (and the proposed alter) are written, deployed to production
  2. This is just a schema change, so the cql command for the schema change can be issued whenever we want - even as a final step of deployment - directly by the deployer.
  3. The feature flag gets flipped in a subsequent deployment when it's assured the schema change has happened everywhere.

This would work, but what we've typically done in the past is engineer our schema changes to be backward compatible (as the addition of a column in your example here, would be), and apply them prior to the updated code. That's "better" in the sense that no code needs to be added in support of a feature flag, but I confess I didn't think about a vector for the CQL snippet to run.

While this workflow seems simple and straightforward, there is obviously the problem that we have no way to track if the schema change was applied other than a !log action to SAL. In theory, we could allow the deployment system to apply any schema changes automatically after the deployment is done, but at this point I don't love the idea.

Automating this in a robust way is difficult, and IMO not worth it given the exceptional nature of a schema change. I'd rather this be done (manually) as a separate step.

In this scenario, a new application is being installed. Since it's still not receiving traffic, all actions can have disruptive consequences with no harm.
The workflow would look like follows:

  1. Code gets deployed to production for the first time.
  2. The deployer applies the schema change, creating the table with default parameters.
  3. The corresponding configuration cql, generated by our configuration management system, gets applied as well.

Thus for the deployment of a new application we need one application deployment, one table creation action, one puppet change, and another manual action. I don't really like those many manual steps for something that could break freely, we can probably think of a better workflow.

A new table is definitely a backward compatible change, and ideally it'd be applied first (and that's the convention we've been using), so alternately:

  1. Schema change is applied (ala cqlsh -f ...)
  2. Puppet changeset that applies the configuration is merged (ALTER will fail if the previous step is not complete)
  3. Configuration change is applied (ala cqlsh -f ...)
  4. Application is deployed

Again, this leaves the question of a vector for the schema change if applying it before the deployment.

All in all, with a couple caveats on finding a better process, I think this proposal can work. I'm just not sure about the strawman puppet proposal. I'd create a define called something like cassandra::table::config and just declare all instances directly in puppet a puppet class cassandra::tables_config that collects them all, instead of relying on hiera and something like create_resources.

My Puppet-fu is weak; I defer to your judgement here. :)

[...]
All in all, with a couple caveats on finding a better process, I think this proposal can work. I'm just not sure about the strawman puppet proposal. I'd create a define called something like cassandra::table::config and just declare all instances directly in puppet a puppet class cassandra::tables_config that collects them all, instead of relying on hiera and something like create_resources.

I've been thinking about this more, and thinking of amending my proposal as follows:

A DSL would be used to define the configuration properties for a cluster. I'm assuming YAML, because it is machine readable, relatively friendly to human editing, and able to be version controlled. These files would be copied from where they are version controlled (still Puppet, presumably?) to the nodes in the cluster, where a script there would generate the CQL statements as output from them. The CQL shell would be used to apply the statements (table_configuration /etc/cassandra-a/table_configuration.cql | cqlsh).

Rationale:

By generating the CQL on the node, we are able to connect to the running cluster, introspect the current configuration state, and generate only the output necessary to match the cluster to the desired config. Without this, we'd be limited to generating an ALTER statement updating all properties, of every table. These statements are idempotent, correctness should not be affected, but it still requires a schema migration to propagate throughout the cluster for each distinct ALTER. If you imagine a cluster like RESTBase with 10s of tables, that stands to be quite disruptive if all we're setting out to do is change one parameter in a single table.

Thoughts?

This comment was removed by Pchelolo.

I've removed the comment since the previous comment got be confused regarding terminology of schema vs configuration here.

Eevans updated the task description. (Show Details)Jun 5 2019, 9:10 PM
Eevans updated the task description. (Show Details)Jun 5 2019, 9:18 PM
Eevans updated the task description. (Show Details)Jun 5 2019, 9:34 PM

One thought - currently the syntax of the table_properties.yaml is something like JSON-path-stile pointers to configurations we need to replace in the original shema.cql. What if instead we allowed templating of the schema.cql and provided the configurations in a yaml similar to values.yaml we use in scap?

Locally, we'd provide table_properties.sample.yaml with simple values so that the schema could be expanded locally using the same tool for testing.

In my opinion, this will have several benefits:

  • The syntax of the table_properties.yaml will be much more flexible, it would work even with a simple key-value pairs list
  • Introducing a templating engine will allow for much easier expansion afterward to handle schema changes as well if we need to
  • Uniformity of how it all works locally, in CI, in small installs, and in production - the same tool will be used. This point is not really about the necessity of a templating engine, but an argument for using the same tool and the same breakdown to schema.cql and properties.yaml in local/small installs too.
Eevans updated the task description. (Show Details)Jun 6 2019, 4:58 PM
holger.knust updated the task description. (Show Details)Jun 13 2019, 2:57 PM
holger.knust added a subscriber: holger.knust.

I changed the YAML structure slightly (replication/data_centers) to make validation easier

Eevans renamed this task from Session storage service Cassandra schema to Management of Cassandra schema and keyspace/table configuration.Jun 28 2019, 6:01 PM
Volans added a subscriber: Volans.Jul 9 2019, 3:09 PM
Eevans updated the task description. (Show Details)Jul 10 2019, 9:00 PM
Eevans updated the task description. (Show Details)Jul 10 2019, 9:19 PM

The task description has been updated with information on a working prototype created by @holger.knust. We'd be very curious to hear any feedback about the general approach (including the workflow it's meant to empower), particular from folks in SRE (@Joe are you still monitoring this ticket? :))

Eevans updated the task description. (Show Details)Jul 10 2019, 9:31 PM
jijiki edited projects, added serviceops-radar; removed serviceops.Jul 12 2019, 1:14 PM

First of all I've a some questions due to my lack of knowledge of Cassandra specifics:

  • Is there any configuration that could require a code change in the application? Can those be changed at will without any coordination with the application?
  • Is there an easy way to connect to the cluster apart from localhost?
  • If this configuration is not managed by Puppet how it will be managed? Would this system have a way to automatically apply those or we'll prefer to still do it manually for safety reasons?

A quick suggestion is to add to the developed software a -q/--quiet option that doesn't produce any output but exit with 0 if there are no pending changed and with non-zero if there are pending changes, suitable to be used as an Icinga check with low frequency to ensure that pending merged changes are not left unapplied for longer periods.

I'm not against in principle to have the configuration part in Puppet, in particular if that is really an operational-only configuration. What I'm wondering is the necessity to ship that to all nodes in the cluster and not instead to a "cluster management" host that will take care of the cluster orchestration. For many other use-cases we use the cumin master hosts (role cluster management in Puppet) to do this.

This would also help to more easily have just a couple of alerts in case of un-deployed merged changes instead of having to perform that check on all nodes and try to deduplicate it later on.

Eevans added a comment.EditedJul 15 2019, 2:53 PM

First of all I've a some questions due to my lack of knowledge of Cassandra specifics:

  • Is there any configuration that could require a code change in the application? Can those be changed at will without any coordination with the application?

I could probably contrive an example where a code change required some change to config, but I think this would be true of most applications, no? There are also (otherwise valid) configurations that would not make any sense for a specific app, but again, I think this is pretty common.

  • Is there an easy way to connect to the cluster apart from localhost?

There is an argument (-c, --contactpoint, which ought be renamed). If that's not working, it's a bug.

  • If this configuration is not managed by Puppet how it will be managed? Would this system have a way to automatically apply those or we'll prefer to still do it manually for safety reasons?

I think for the foreseeable future, we'll manually apply them. Any system that did so automatically would need to be much more robust, and this is something we just don't do very often (see also: https://xkcd.com/1205/ :)).

A quick suggestion is to add to the developed software a -q/--quiet option that doesn't produce any output but exit with 0 if there are no pending changed and with non-zero if there are pending changes, suitable to be used as an Icinga check with low frequency to ensure that pending merged changes are not left unapplied for longer periods.

Auh yes, I've spoken about doing exactly this with others... we should definitely do this. I think I'd go so far as to say that the exit status ought to reflect whether or not changes are pending regardless (at which point, it might be enough to redirect stdout to /dev/null).

I'm not against in principle to have the configuration part in Puppet, in particular if that is really an operational-only configuration. What I'm wondering is the necessity to ship that to all nodes in the cluster and not instead to a "cluster management" host that will take care of the cluster orchestration. For many other use-cases we use the cumin master hosts (role cluster management in Puppet) to do this.

So, there is some precedent for this; We generate a file called adduser.cql that is used to create (or update) the role accounts and passwords, these too only need to be applied once (on any node).

To answer your question though, I think the "why not" is simply because we didn't know of any qualifying host. The Cassandra cluster itself was available to us, and we used all nodes so that we did not need to special-case any one in particular. And I guess, running it local to a node means not having to worry about ferm rules and the like.

That said, I'm not at all opposed to doing this off-cluster, that makes sense.

This would also help to more easily have just a couple of alerts in case of un-deployed merged changes instead of having to perform that check on all nodes and try to deduplicate it later on.

Indeed it would.

I had a chat on IRC with @Eevans, here some additional proposal that came up:

  • have a RO Cassandra user (either per DB or a single one) to perform the Icinga check of undeployed merged changes (one check per DB) and either:
    • perform it directly from the Icinga hosts if ok security-wise (means opening ferm to the cassandra cluster from the Icinga hosts and trust Cassandra auth). PRO: avoid duplication of alerts, only the active Icinga host will alarm.
    • perform it indirectly via NRPE from some internal hosts (like deploy[12]001 for example). CON: we'll have double alerts (both hosts) for each critical one. But there is an Observability goal this Q that should help to reduce duplication of Icinga alerts.
  • As for where to deploy the tool + config, it seems that probably all the Cassandra nodes might be a good solution given that:
    • the alerting is not anymore an issue as per the above
    • the current workflow involves mostly people in CPT to perform those alter tables (so hosts like the cluster management ones cumin* are our of the equation)
    • Cassandra will do the right thing if there will be any concurrency of deployers applying the same alter table.
    • The advantage of using some centralized host like deploy[12]001 seems very low and actually would require some dedicated group to make sure only the right people can run the tool.

@Volans and @Eevans: I just updated the repo with a new version (0.4) that supports the --quiet switch and exits with a non-zero code if there are configuration changes. Also, the host override switch changed to -i or --ip <ip addr> from -c.

holger.knust updated the task description. (Show Details)Jul 17 2019, 7:43 PM

Change 524921 had a related patch set uploaded (by Holger Knust; owner: Holger Knust):
[operations/software/cassandra-table-properties@master] table-properties: Initial commit

https://gerrit.wikimedia.org/r/524921