(WARNING) (re)Work in progress
# Schema v. Configuration
Cassandra's DDL covers what is traditionally considered //schema//, but also information that is more configuration in nature. For example, consider keyspace creation:
```lang=sql
CREATE KEYSPACE "globaldomain_T_mathoid__ng_mml" WITH replication = {'class': 'NetworkTopologyStrategy', 'codfw': '3', 'eqiad': '3'} AND durable_writes = true;
```
A keyspace in Cassandra is a namespace to associate tables with (similar to a database in MySQL terminology). Here, //globaldomain_T_mathoid__ng_mml// is the keyspace, and everything that follows the `WITH` is configuration pertaining to associated tables (replication, or whether or not to make use of the commitlog).
It is similar with tables:
```lang=sql
CREATE TABLE "globaldomain_T_mathoid__ng_mml".data (
"_domain" text,
key text,
headers text,
tid timeuuid,
value text,
PRIMARY KEY (("_domain", key))
) WITH bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND comment = ''
AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
AND compression = {'chunk_length_in_kb': '32', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 86400
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99PERCENTILE';
```
Likewise, the DDL describes both schema, and table-specific configuration. In the example above, //"globaldomain_T_mathoid__ng_mml".data// is the table name, followed within the parenthesis by the names and types of the attributes. This is schema, as it models the data to be stored there. Everything that follows the `WITH` however, is configuration.
This is an important distinction (schema v configuration), because schema is determined by the application; No change in schema makes sense without a corresponding change to the application. Configuration however is site-specific, and operational in nature; Parameters can be unique to a use-case, and updated frequently outside of any change to the application. Schema is determined by application developers, configuration by users/operators.
# Proposal
## Schema Management
Since schema is so tightly coupled to the application, it makes sense that it be kept versioned with application code, where it can be changed in lock-step.
Ideally, we'd omit all configuration data, and rely on post-creation `ALTER`s to update defaults, but `CREATE KEYPACE` requires us to supply replication parameters. In this case all we can do is provide the minimum information (and document for third-parties the expectation that they followup).
```lang=sql,name=schema.cql
-- NOTE: Remember to use `ALTER` after creation to set replication according to site requirementsUse ALTER after keyspace creation to update replication according to your site requirements.
CREATE KEYSPACE data
WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 1};
-- NOTE: Use ALTER after table creation to update default properties, if necesary.
CREATE KEYSPACTABLE data WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 1};a.values (
CREATE TABLE data.values ( username text PRIMARY KEY,
given text,
surname text);
);
```
(IMPORTANT) TODO: Do.A complete schema is only useful for initial setup. It may be necessary to ship the DDL needed to upgrade a schema between releases.
```lang=sql,name=1.0-to-1.1.cql
ALTER TABLE data.values ADD email text;
```
## Configuration Management
We typically use Puppet to manage configuration...
(IMPORTANT) TODO: Do.