Page MenuHomePhabricator

Research: Reduce Cassandra memory usage by avoiding deserialization of expired data
Closed, DeclinedPublic

Description

In our current production workload, we make heavy use of TTLs. Over time, many cells with expired TTLs accumulate. Cassandra currently does not turn those expired cells into tombstones on compaction, and reads full values when performing range scans. This is bad for performance: Compactions need to repeatedly touch old data, SSTables are significantly larger than they could be. On read, reading thousands of expired cells frequently provokes OOM failures.

While our new schemas avoids this use case largely, there will still be some reads to TTL'ed data. It might be possible to improve performance for such reads by considering liveness when deserializing a cell, and treating / representing the cell as a tombstone (without data) if the data is expired. This could a) avoid using memory to hold the data, and b) implicitly drop the data on compaction, assuming the Cell deserialization path is used in compactions as well.

While the priority for this kind of change will be lower after migration to the new schema, it might still be worth checking if a simple isLive() test in the read path would already be sufficient for this optimization.

Event Timeline

GWicke triaged this task as Medium priority.Aug 29 2017, 7:32 PM

@Eevans I believe we can decline this task as we probably will never work on this and the issue is not as pressing as it used to be with the new storage model?

@Eevans I believe we can decline this task as we probably will never work on this and the issue is not as pressing as it used to be with the new storage model?

SGTM