As discussed in T94121 and T150811, leveled compaction could increase data locality, and limit the number of SSTables that need to be touched for a read. However, there are also issues with LCS. This task is to collect information on both pros & cons of using LCS, to be used as part of creating an overall plan.
- Read performance. Splits token range into non-overlapping SSTables. Total number of touched SSTables limited by the number of levels. For reasonably-sized instances, this number is 5 or 6.
- Handles skew updates efficiently by compacting busy token range more frequently than rately updated ones.
- SSTables containing a frequently updated partition are still considered overlapping with those in lower levels for compaction purposes, as Cassandra only considers the presence of a partition & the write time of an entire SSTable. Tombstones for wide rows are only collected in the lowest level, despite not actually shadowing any data in the lower levels in the normal case.
- Occasional major compactions (https://issues.apache.org/jira/browse/CASSANDRA-7272, https://github.com/scylladb/scylla/issues/1431) could be used to work around this in the short term, but this is fairly expensive. In Cassandra 2.2, it would require temporarily disabling automatic compactions on a keyspace.
- A potential improvement in Cassandra would be to actually consider clustering key range overlaps during compaction, as discussed in T94121#2710479. This would allow dropping tombstones quickly in early level compactions (L0->L1, or L1->L2).
- Write amplification is normally higher with LCS than with other standard strategies. However, in practice STCS and TWCS both do not offer sufficient read performance & timely tombstone collection, and are thus combined with ad-hoc manual compactions. It is unclear how the overall write amplification including those manual major compactions compares with LCS.