Cassandra 2.2.5 should be evaluated/tested for use in production.
Given the (presumably) low delta to 2.1.12, (and the maturity/stability of the 2.1.x branch at this point), it might make sense to first consider an upgrade to 2.1.13 while we spend the time evaluating 2.2.5.
2.1.13
Possible issues addressed:
Timeline
Per #2075200, after the bootstrap of 1010 (2016-03-07?).
2.2.5 2.2.6
Interesting changes (non-exhaustive)
- JSON
- UDFs
- UDAs
- Role-based access control
- Native protocol v4
- Resumable bootstraps
- Direct ByteBuffers for decompression reads
- Compressed commitlog
- Message coalescing
- Async debug logging
- Improved repair concurrency/paralleism
- Full repairs mark SSTables as repaired
- Incremental repairs as default
Upgrading
Possible gotchas
- Conflict with system installed libjna-java
- Use of -XX:+PerfDisableSharedMem may break cassandra-metrics-collector auto-discovery
- Conflict of Jackson dependencies w/ those needed by logstash
- System keyspace schema migration for RBAC (Role-based access control)
Configuration
- cassandra-env.sh
- cassandra.yaml
- logback.xml
Timeline
TBD (after: T125906: Evaluate Brotli compression for Cassandra)
Current Status (v2.2.5 v2.2.6 upgrade)
There is a work-in-progress Puppet changeset here. I believe it is mostly OK at this point (a few nits remain).
Between 2.1.13 and 2.2.5, metrics in Cassandra were refactored to wrap the Dropwizard mbeans in Cassandra delegator objects. Doing this changed the names (obviously), and broke cassandra-metrics-collector. The fix for that is here, though a little work is still needed to the Puppet patch to (conditionally) setup the systemd unit on machines running the 2.2 series (it's meant to keep working as-is for 2.1 machines).
In addition to the breakage in cassandra-metrics-collector, there is also some breakage that results from the metric type changing (in certain select cases), for example from Gauge to Counter (the JMX attribute changes from value to count, and so the Graphite metric names changes accordingly). I'm still assessing the full extent of this class of breakage. Options will include papering over this in cassandra-metrics-collector, or letting the chips fall as they may, and dealing with the fallout in the dashboards.
In addition to the breakage in cassandra-metrics-collector, there are some differences to the available metrics between the two versions, a detailed analysis follows:
Explanation: I don't even.
Explanation: Messages are no longer tracked here by "Command/Response", but by "GossipMessage", and "{Large,Small}Message", (so the old metrics are replaced by the new).
Explanation: The BINARY verb was deprecated in 2.1.13
The Good News: None of the above changes will impact us; With the exception of dropped messages for BINARY (which are no longer relevant), none of our existing dashboards will be effected.
Upgrade Process
For each node in a cluster
- Disable Puppet (sudo puppet agent --disable "Upgrading to 2.2.6 : T126629")
- Set cassandra::target_version in Puppet to 2.2, and merge
- Shutdown Cassandra (sudo service cassandra stop)
-
Place a hold on the cassandra package (echo "cassandra hold" | sudo dpkg --set-selections) -
Place a hold on the cassandra-tools package, (if installed) (echo "cassandra-tools hold" | sudo dpkg --set-selections) - Upgrade Cassandra package (sudo dpkg -i cassandra_2.2.6._all.deb)
- Upgrade Cassandra Tools package (if installed) (sudo dpkg -i cassandra-tools_2.2.6._all.deb)
- Enable Puppet (sudo puppet agent --enable)
- Force Puppet run, (sudo puppet agent -tv)
After all nodes are upgraded
- Drop the legacy system_auth tables
After all clusters are upgraded
- Update the APT repository with the 2.2.6 package(s)
- Unhold cassandra{-tools} on all machines
On each 2.2.6 node startup, Cassandra will attempt to migrate the legacy tables. This will fail until enough nodes have been upgraded to satisfy the consistency level. When that succeeds, you'll see the following output in the logs.
Once all of the nodes have been upgraded, and preserving the possibility of a rollback is no longer necessary, then the 3 legacy tables should be dropped (see below).
Once the legacy tables have been dropped, Cassandra will begin using the new tables. The log output post-DROP looks like the following: