Page MenuHomePhabricator

Verify requirements and parameters for efficient TTL'ed storage in Cassandra
Closed, ResolvedPublic


We are currently discussing several schema changes that separate permanent storage from TTL'ed temporary storage. The motivation for this change is to get efficient compaction, where Cassandra can drop entire expired SSTables, rather than compacting them with other content.

Before we can rely on this, we should establish

  • what the requirements are to benefit from efficient and timely SSTable expiry (ex: per-table TTL vs. per-row), and
  • a characterization of a) how timely the expiry process can be, and b) the expected write amplification from intermediate compactions before reaching the (eventually expired) final SSTable.

As a result, we should have a better understanding of the performance gains we will get from this setup, compared to a mixed-storage setup. We have some use cases that could potentially become feasible with efficient expiring storage, but need a better quantitative understanding in order to be able to compare Cassandra to alternatives.

Event Timeline

OK, so as I suspected (when I retracted my earlier remarks in the design docs), the table-level property is not used to optimize when an entire SSTable can be dropped due to expiry [0]. If this were true, it would make it impossible to override the TTL on a value at query-time, without some very surprising results. The semantics are such that having a default results in columns having a TTL as if supplied by the client, when left unset (so for example, altering the default later does not change any already flushed to disk). One of the primary driving forces behind this property was map-reduce use-cases, where the APIs didn't permit assigning a TTL; This is basically just sugar.

There are optimizations however that permit entire tables to be dropped. These optimizations take into account overlap with non-compacting SSTables, and use the tables local max deletion time to drop files containing entirely expired data. For tables with nothing but TTL values where we're not doing any overwrites or deletes, a table should be droppable just as soon as the newest value it contains has exceeded the TTL + grace period. Since STCS is applied to the current window of TWCS, and older tables are being combined with newer, it's probably safe to assume that drops will occur in the historical windows.

By way of example: Assuming a TTL of 24 hours and a gc_grace_seconds of 0 [1], and a TWCS window size of 1 day, we should never have more than 2 windows in play, the current (being STCS compacted), and one historical window containing a single table that will be a candidate for dropping in <= 24 hours.

[0]: There are docs that contradict this, but they are wrong.
[1]: Just an example to keep the math simple; A gc_grace_seconds of 0 will effectively disable hints, so we probably want at least some minimal value here.

As far as I know, the (relevant) questions posed here have been answered, and this ticket can be closed.