Page MenuHomePhabricator

Evaluate efficacy of DateTieredCompactionStrategy
Closed, ResolvedPublic

Description

While investigating whether it was safe to do a decommission (T119935: Upgrade restbase100[7-9] to match restbase100[1-6] hardware), a number of large (100GB+) sstables were found, mostly belonging to local_group_wikipedia_T_parsoid_html and local_group_wikipedia_T_parsoid_dataW4ULtxs1oMqJ

eqiad

restbase1001.eqiad.wmnet: 127495153070 Jan 26 11:15 /var/lib/cassandra/data/local_group_wikipedia_T_parsoid_dataW4ULtxs1oMqJ/data-f4d92a60c2cb11e4ab6181ba0e170b9f/local_group_wikipedia_T_parsoid_dataW4ULtxs1oMqJ-data-ka-502803-Data.db
restbase1001.eqiad.wmnet: 388938516642 Jan 23 06:49 /var/lib/cassandra/data/local_group_wikipedia_T_parsoid_html/data-f3648bc0c2cb11e49ce6a1da77f2fd34/local_group_wikipedia_T_parsoid_html-data-ka-1032092-Data.db
restbase1001.eqiad.wmnet: 185769479012 Jan 27 17:10 /var/lib/cassandra/data/local_group_wikipedia_T_parsoid_html/data-f3648bc0c2cb11e49ce6a1da77f2fd34/local_group_wikipedia_T_parsoid_html-data-ka-1038636-Data.db
restbase1001.eqiad.wmnet: 111310087737 Jan 9 10:54 /var/lib/cassandra/data/local_group_wikipedia_T_parsoid_html/data-f3648bc0c2cb11e49ce6a1da77f2fd34/local_group_wikipedia_T_parsoid_html-data-ka-1022273-Data.db
restbase1002.eqiad.wmnet: 162380600851 Feb 1 08:57 /var/lib/cassandra/data/local_group_wikipedia_T_parsoid_html/data-f3648bc0c2cb11e49ce6a1da77f2fd34/local_group_wikipedia_T_parsoid_html-data-ka-1050123-Data.db
restbase1002.eqiad.wmnet: 409211605648 Feb 1 11:18 /var/lib/cassandra/data/local_group_wikipedia_T_parsoid_html/data-f3648bc0c2cb11e49ce6a1da77f2fd34/local_group_wikipedia_T_parsoid_html-data-ka-1048802-Data.db
restbase1002.eqiad.wmnet: 186547550850 Jan 9 19:05 /var/lib/cassandra/data/local_group_wikipedia_T_parsoid_html/data-f3648bc0c2cb11e49ce6a1da77f2fd34/local_group_wikipedia_T_parsoid_html-data-ka-1026199-Data.db
restbase1002.eqiad.wmnet: 151159969823 Jan 22 04:14 /var/lib/cassandra/data/local_group_wikipedia_T_parsoid_dataW4ULtxs1oMqJ/data-f4d92a60c2cb11e4ab6181ba0e170b9f/local_group_wikipedia_T_parsoid_dataW4ULtxs1oMqJ-data-ka-469143-Data.db
restbase1003.eqiad.wmnet: 334065628159 Feb 2 01:15 /var/lib/cassandra/data/local_group_wikipedia_T_parsoid_html/data-f3648bc0c2cb11e49ce6a1da77f2fd34/local_group_wikipedia_T_parsoid_html-data-ka-994255-Data.db
restbase1003.eqiad.wmnet: 135789307038 Jan 9 10:18 /var/lib/cassandra/data/local_group_wikipedia_T_parsoid_html/data-f3648bc0c2cb11e49ce6a1da77f2fd34/local_group_wikipedia_T_parsoid_html-data-ka-971574-Data.db
restbase1003.eqiad.wmnet: 124779613448 Feb 1 01:48 /var/lib/cassandra/data/local_group_wikipedia_T_parsoid_dataW4ULtxs1oMqJ/data-f4d92a60c2cb11e4ab6181ba0e170b9f/local_group_wikipedia_T_parsoid_dataW4ULtxs1oMqJ-data-ka-508414-Data.db
restbase1004.eqiad.wmnet: 210490091937 Jan 10 00:17 /srv/cassandra-a/data/local_group_wikipedia_T_parsoid_html/data-f3648bc0c2cb11e49ce6a1da77f2fd34/local_group_wikipedia_T_parsoid_html-data-ka-12926-Data.db
restbase1004.eqiad.wmnet: 320535565084 Feb 2 00:38 /srv/cassandra-a/data/local_group_wikipedia_T_parsoid_html/data-f3648bc0c2cb11e49ce6a1da77f2fd34/local_group_wikipedia_T_parsoid_html-data-ka-36231-Data.db
restbase1004.eqiad.wmnet: 177687145087 Jan 21 18:56 /srv/cassandra-a/data/local_group_wikipedia_T_parsoid_html/data-f3648bc0c2cb11e49ce6a1da77f2fd34/local_group_wikipedia_T_parsoid_html-data-ka-23630-Data.db
restbase1004.eqiad.wmnet: 111817799238 Jan 21 03:28 /srv/cassandra-a/data/local_group_wikipedia_T_parsoid_dataW4ULtxs1oMqJ/data-f4d92a60c2cb11e4ab6181ba0e170b9f/local_group_wikipedia_T_parsoid_dataW4ULtxs1oMqJ-data-ka-9851-Data.db
restbase1005.eqiad.wmnet: 162924636480 Jan 28 09:46 /var/lib/cassandra/data/local_group_wikipedia_T_parsoid_html/data-f3648bc0c2cb11e49ce6a1da77f2fd34/local_group_wikipedia_T_parsoid_html-data-ka-1069227-Data.db
restbase1005.eqiad.wmnet: 182203048061 Jan 9 17:07 /var/lib/cassandra/data/local_group_wikipedia_T_parsoid_html/data-f3648bc0c2cb11e49ce6a1da77f2fd34/local_group_wikipedia_T_parsoid_html-data-ka-1051364-Data.db
restbase1005.eqiad.wmnet: 402386944808 Feb 2 16:48 /var/lib/cassandra/data/local_group_wikipedia_T_parsoid_html/data-f3648bc0c2cb11e49ce6a1da77f2fd34/local_group_wikipedia_T_parsoid_html-data-ka-1074055-Data.db
restbase1005.eqiad.wmnet: 136095741714 Jan 26 10:28 /var/lib/cassandra/data/local_group_wikipedia_T_parsoid_dataW4ULtxs1oMqJ/data-f4d92a60c2cb11e4ab6181ba0e170b9f/local_group_wikipedia_T_parsoid_dataW4ULtxs1oMqJ-data-ka-570519-Data.db
restbase1006.eqiad.wmnet: 172080229432 Jan 17 19:38 /var/lib/cassandra/data/local_group_wikipedia_T_parsoid_html/data-f3648bc0c2cb11e49ce6a1da77f2fd34/local_group_wikipedia_T_parsoid_html-data-ka-995725-Data.db
restbase1006.eqiad.wmnet: 145445796927 Jan 27 23:40 /var/lib/cassandra/data/local_group_wikipedia_T_parsoid_html/data-f3648bc0c2cb11e49ce6a1da77f2fd34/local_group_wikipedia_T_parsoid_html-data-ka-1005632-Data.db
restbase1006.eqiad.wmnet: 354055854836 Jan 31 23:43 /var/lib/cassandra/data/local_group_wikipedia_T_parsoid_html/data-f3648bc0c2cb11e49ce6a1da77f2fd34/local_group_wikipedia_T_parsoid_html-data-ka-1008838-Data.db
restbase1006.eqiad.wmnet: 122708401314 Jan 31 23:28 /var/lib/cassandra/data/local_group_wikipedia_T_parsoid_dataW4ULtxs1oMqJ/data-f4d92a60c2cb11e4ab6181ba0e170b9f/local_group_wikipedia_T_parsoid_dataW4ULtxs1oMqJ-data-ka-447000-Data.db
restbase1007.eqiad.wmnet: 108110540027 Feb 6 12:26 /srv/cassandra-a/data/local_group_wikipedia_T_parsoid_html/data-f3648bc0c2cb11e49ce6a1da77f2fd34/local_group_wikipedia_T_parsoid_html-data-ka-20091-Data.db
restbase1007.eqiad.wmnet: 158879625023 Jan 31 05:08 /srv/cassandra-a/data/local_group_wikipedia_T_parsoid_html/data-f3648bc0c2cb11e49ce6a1da77f2fd34/local_group_wikipedia_T_parsoid_html-data-ka-17301-Data.db
restbase1008.eqiad.wmnet: 109558289101 Feb 7 05:28 /srv/cassandra-a/data/local_group_wikipedia_T_parsoid_html/data-f3648bc0c2cb11e49ce6a1da77f2fd34/local_group_wikipedia_T_parsoid_html-data-ka-18706-Data.db
restbase1009.eqiad.wmnet: 208762968302 Jan 21 04:52 /srv/cassandra-a/data/local_group_wikipedia_T_parsoid_html/data-f3648bc0c2cb11e49ce6a1da77f2fd34/local_group_wikipedia_T_parsoid_html-data-ka-18634-Data.db

codfw

restbase2002.codfw.wmnet: 112537008879 Jan 20 10:27 /srv/cassandra-b/data/local_group_wikipedia_T_parsoid_html/data-f3648bc0c2cb11e49ce6a1da77f2fd34/local_group_wikipedia_T_parsoid_html-data-ka-21051-Data.db
restbase2003.codfw.wmnet: 218131112901 Jan 9 22:39 /var/lib/cassandra/data/local_group_wikipedia_T_parsoid_html/data-f3648bc0c2cb11e49ce6a1da77f2fd34/local_group_wikipedia_T_parsoid_html-data-ka-182511-Data.db
restbase2003.codfw.wmnet: 229355616392 Feb 1 05:23 /var/lib/cassandra/data/local_group_wikipedia_T_parsoid_html/data-f3648bc0c2cb11e49ce6a1da77f2fd34/local_group_wikipedia_T_parsoid_html-data-ka-208453-Data.db
restbase2003.codfw.wmnet: 210988335257 Jan 21 20:31 /var/lib/cassandra/data/local_group_wikipedia_T_parsoid_html/data-f3648bc0c2cb11e49ce6a1da77f2fd34/local_group_wikipedia_T_parsoid_html-data-ka-194773-Data.db
restbase2004.codfw.wmnet: 184258575748 Jan 21 17:37 /var/lib/cassandra/data/local_group_wikipedia_T_parsoid_html/data-f3648bc0c2cb11e49ce6a1da77f2fd34/local_group_wikipedia_T_parsoid_html-data-ka-196682-Data.db
restbase2004.codfw.wmnet: 256760687053 Feb 1 07:02 /var/lib/cassandra/data/local_group_wikipedia_T_parsoid_html/data-f3648bc0c2cb11e49ce6a1da77f2fd34/local_group_wikipedia_T_parsoid_html-data-ka-211174-Data.db
restbase2004.codfw.wmnet: 231210786676 Jan 9 21:46 /var/lib/cassandra/data/local_group_wikipedia_T_parsoid_html/data-f3648bc0c2cb11e49ce6a1da77f2fd34/local_group_wikipedia_T_parsoid_html-data-ka-184135-Data.db
restbase2004.codfw.wmnet: 111108207450 Jan 21 04:03 /var/lib/cassandra/data/local_group_wikipedia_T_parsoid_dataW4ULtxs1oMqJ/data-f4d92a60c2cb11e4ab6181ba0e170b9f/local_group_wikipedia_T_parsoid_dataW4ULtxs1oMqJ-data-ka-64034-Data.db
restbase2005.codfw.wmnet: 226636152901 Jan 9 22:51 /var/lib/cassandra/data/local_group_wikipedia_T_parsoid_html/data-f3648bc0c2cb11e49ce6a1da77f2fd34/local_group_wikipedia_T_parsoid_html-data-ka-190601-Data.db
restbase2005.codfw.wmnet: 248658084363 Feb 1 00:28 /var/lib/cassandra/data/local_group_wikipedia_T_parsoid_html/data-f3648bc0c2cb11e49ce6a1da77f2fd34/local_group_wikipedia_T_parsoid_html-data-ka-217328-Data.db
restbase2005.codfw.wmnet: 209888122903 Jan 22 01:02 /var/lib/cassandra/data/local_group_wikipedia_T_parsoid_html/data-f3648bc0c2cb11e49ce6a1da77f2fd34/local_group_wikipedia_T_parsoid_html-data-ka-203336-Data.db
restbase2006.codfw.wmnet: 244311982871 Jan 30 04:48 /var/lib/cassandra/data/local_group_wikipedia_T_parsoid_html/data-f3648bc0c2cb11e49ce6a1da77f2fd34/local_group_wikipedia_T_parsoid_html-data-ka-215379-Data.db
restbase2006.codfw.wmnet: 121523042580 Jan 9 12:16 /var/lib/cassandra/data/local_group_wikipedia_T_parsoid_html/data-f3648bc0c2cb11e49ce6a1da77f2fd34/local_group_wikipedia_T_parsoid_html-data-ka-192351-Data.db
restbase2006.codfw.wmnet: 252504850245 Jan 22 04:50 /var/lib/cassandra/data/local_group_wikipedia_T_parsoid_html/data-f3648bc0c2cb11e49ce6a1da77f2fd34/local_group_wikipedia_T_parsoid_html-data-ka-204320-Data.db

In particular, what prompted this investigation was that the ~400GB file on restbase1002 (/var/lib/cassandra/data/local_group_wikipedia_T_parsoid_html/data-f3648bc0c2cb11e49ce6a1da77f2fd34/local_group_wikipedia_T_parsoid_html-data-ka-1048802-Data.db), resulted in a ~20% swing in disk space utilized during its compaction. The upper bound on these files is determined by a time interval that is currently 818 days, since these very large files have intervals of less than 1 year, they are on-track to become considerably larger. Given that we must maintain at least as much free space as the largest sstable, the current trajectory is intractable, even in light of upcoming smaller instance sizes.

Additionally, an evaluation of the files (as they looked on Feb. 2), would seem to indicate that compaction is failing to produce the date-based tiering expected.

Event Timeline

fgiunchedi raised the priority of this task from to Needs Triage.
fgiunchedi updated the task description. (Show Details)
fgiunchedi added a subscriber: fgiunchedi.
Restricted Application added subscribers: StudiesWorld, Aklapper. · View Herald TranscriptFeb 8 2016, 2:50 PM

Large SSTables are normal with STCS and DTCS. As expected, the amplitude is smallest on the multi-instance nodes 2001 and 2002, and will be even smaller with a higher number of instances.

there are a few open questions even in the multi instance case, for example:

  1. what determines the size of the biggest sstable? that has a direct impact on capacity planning since it is the amount of disk space we've committed to keep free for compactions
    1. is it something that we control (e.g. DTCS knobs) or something that we don't (e.g. edit traffic)?
  2. will a big sstable get only bigger over time? if so, by how much given the above?
  3. how often should be expecting a big sstable to be compacted?

the restbase1002 case is what prompted me to look into this, it swung from 76% used on 30/1 at ~13.00 to 91% used on 2/1 at ~7.00 before dropping back to 70% (see below). We should look closely into what's the cause and when it is going to happen again.

[1] https://ganglia.wikimedia.org/latest/graph.php?r=month&z=xlarge&c=Restbase+eqiad&h=restbase1002.eqiad.wmnet&jr=&js=&v=67.2&m=part_max_used&vl=%25&ti=Maximum+Disk+Space+Used


I was toying around with a dashboard to correlate CF writes and used disk space, https://grafana.wikimedia.org/dashboard/db/filippo-cassandra-disk-size-vs-compactions

Eevans added a subscriber: Eevans.EditedFeb 9 2016, 3:35 PM

there are a few open questions even in the multi instance case, for example:

  1. what determines the size of the biggest sstable? that has a direct impact on capacity planning since it is the amount of disk space we've committed to keep free for compactions
    1. is it something that we control (e.g. DTCS knobs) or something that we don't (e.g. edit traffic)?

A bit of both.

Once an SSTable exceeds max_window_size_seconds (the knob) it is no longer considered for compaction. Obviously though that means that the size is also a function of the volume of data written in that time period.

  1. will a big sstable get only bigger over time? if so, by how much given the above?

You can expect it to get bigger until the interval between it's min and max timestamps exeed max_window_size_seconds, currently set to 70736000(!) (at least on WP parsoid_html).

  1. how often should be expecting a big sstable to be compacted?

That's tricky to say, I think. I believe the answer is, When min_threshold files of the same bucket exist, but that probably doesn't help answer your question of how often in a meaningful way.

the restbase1002 case is what prompted me to look into this, it swung from 76% used on 30/1 at ~13.00 to 91% used on 2/1 at ~7.00 before dropping back to 70% (see below). We should look closely into what's the cause and when it is going to happen again.

I captured some data about the SSTables for WP parsoid_html on restbase1002 on Friday, and cobbled together this JSFidle to try and visualize the time periods covered, and file sizes.

I'm still trying to get a handle on precisely how DTCS is supposed to work. There hasn't been a lot written on the subject, and what does exist, should be taken with a grain of salt since there have been some significant changes since, but...

Absent in this visualization is the "tiering" aspect of DateTieredCompationStrategy. I wonder if this isn't the result of out-of-order writes making new files candidates for compaction with older ones.

Another thing worth pointing out is that the largest file (presumably the one that resulted in the 76% -> 91% -> 70% swing) covers a range spanning 10 months 25 days (or 331 days, with the most recent timestamp being from Jan. 30). That file is 381GB now, and won't be ineligible for compaction until June of 2017. By that time it will be... very large.

Eevans claimed this task.Feb 9 2016, 3:41 PM
Eevans set Security to None.
mark added a subscriber: mark.Feb 11 2016, 4:56 PM
Eevans renamed this task from impact of large sstables on cassandra to Efficacy of DateTieredCompactionStrategy.Feb 15 2016, 5:44 PM
Eevans updated the task description. (Show Details)
Eevans triaged this task as High priority.Feb 15 2016, 5:48 PM

We have a number of cluster changes coming down the pipe (T119935, T125842, and T95253), and the compaction related implications could prove problematic (particularly if we succeed in increasing stream throughput).

Bumping priority to high

GWicke added a comment.EditedFeb 15 2016, 5:50 PM

the compaction related implications could prove problematic

This seems rather vague. Are you aiming to compare DTCS (which uses STCS) to plain STCS? What is your hypothesis?

the compaction related implications could prove problematic

This is rather vague. Are you aiming to compare DTCS (which uses STCS) to plain STCS?

I aim to understand what is going on, to gather the relevant facts; I don't want to have to speculate.

What is your hypothesis?

That out-of-order writes are causing newer/smaller sstables to prematurely become candidates for compaction with older/larger sstables.

GWicke added a comment.EditedFeb 15 2016, 6:38 PM

That out-of-order writes are causing newer/smaller sstables to prematurely become candidates for compaction with older/larger sstables.

We know that read repairs cause some out-of-order writes, and that a fraction of sstables is moved to older DTCS windows for that reason. Per-window compaction is managed by STCS, which is well understood and prevents unexpected write amplification or compaction activity. The main issue I see with a small percentage of out-of-order writes is a slower compaction of those out-of-order sstables, which can cause a slightly higher read latency.

Plain STCS would avoid this special treatment of out-of-order writes, but would also lose

  • compaction pacing for most sstables, and
  • maximum time window limits (and thus some limit on the maximum size of sstables).

In practice, our current setup is delivering decent read latencies at moderate compaction load. We have not seen any operational issues from bootstraps or (higher-speed) rebuilds. While it is likely that further improvements are possible, it is not clear to me that compaction strategies are the most profitable area for performance improvements at this point.

Eevans added a comment.EditedFeb 15 2016, 7:20 PM

That out-of-order writes are causing newer/smaller sstables to prematurely become candidates for compaction with older/larger sstables.

We know that read repairs cause some out-of-order writes, and that a fraction of sstables is moved to older DTCS windows for that reason. Per-window compaction is managed by STCS, which is well understood and prevents unexpected write amplification or compaction activity.

Is it only read-repairs? Is it only a "fraction"? How much is a "fraction"? How much of a fraction does it take to create problems? All I'm interested in is hard answers, if you have these then we might be able to close this issue.

The main issue I see with a small percentage of out-of-order writes is a slower compaction of those out-of-order sstables, which can cause a slightly higher read latency.

And a higher number of SSTables-per-read as the data is spread across a larger number of SSTables. For the sample provided in that JSFidle, the 99p is equal to 2/3s of all the tables. The way to counter this (what has been done to counter it so far) is to widen the max window to create fewer, bigger tables. That "strategy" isn't going to work indefinitely.

Plain STCS would avoid this special treatment of out-of-order writes, but would also lose

  • compaction pacing for most sstables, and
  • maximum time window limits (and thus some limit on the maximum size of sstables).

I'm not advocating for STCS, I'm advocating for a clearer understanding of how things are actually working, and why.

In practice, our current setup is delivering decent read latencies at moderate compaction load. We have not seen any operational issues from bootstraps or (higher-speed) rebuilds. While it is likely that further improvements are possible, it is not clear to me that compaction strategies are the most profitable area for performance improvements at this point.

I bumped the priority because we have been running DTCS for a while now, and ought to be able to speak to how it is working. No one should be finding themselves surprised (for example) about the size of SSTables, or wondering how big those are going to get. Additionally, bootstrapping and decommissioning generate a lot of compaction, and anti-compaction activity, if we manage to get stream throughput up (I consider T126619 a higher priority to this, fwiw), then a storm of compaction activity could become the next bottle neck. Finally, we're in the midst of a push (T119935, T125842, and T95253) to put an end to our growing pains once and for all, and establish a sane way forward, including an understanding of our limits, what will drive the need to expand, etc; It seemed prudent to me that we start getting out in front of these issues once and for all.

No one should be finding themselves surprised (for example) about the size of SSTables, or wondering how big those are going to get.

As a basic rule, both DTCS and STCS can compact all data into a single sstable. While this is rare in practice (especially with DTCS with limited time windows), it is prudent to leave enough storage headroom to accommodate large compactions. With multiple instances and tables, this does not mean that we need to keep 50% of storage free at all times. It is extremely unlikely that all instances perform a major compaction of all the largest tables at exactly the same time. While it is impossible to provide a hard guarantee for how far we can push this, I think we can draw some conservative conclusions from metrics and some worst-case reasoning, taking into account the expected data set sizes. I would personally be hesitant to go beyond 70% utilization with a five instance setup.

a storm of compaction activity could become the next bottle neck

Compaction of streamed data is predominantly handled using STCS, and from what I have seen has been working as expected. You have alluded to possible issues. Do you have anything specific in mind, or are you worried about surprises?

Eevans added a comment.Apr 6 2016, 8:12 PM

I conducted an audit of compactions on restbase1007-a.eqiad.wmnet over the weekend (from April 1-4), the result of which can be seen, visualized as a directed graph, here: https://people.wikimedia.org/~eevans/20140401_to_20140404.svg

Some explanation

The auditing tools iterate over existing SSTables in a specified directory at startup, running sstablemetadata on each, and storing the results in an sqlite database. It then uses inotify to monitor the directory for newly generated SSTables, and likewise stores metadata for each newly created file. A separate script is then used to generate the visualization.

In the visualization, blue nodes are referenced ancestors that weren't available during the lifetime of the audit (they were compacted away sometime before it began). Green nodes are smallish files, yellow somewhat bigger, and red larger still (for arbitrary (and unimportant) definitions of smallish, somewhat bigger, and larger still).

Some conclusions

If you look at the upper-most green nodes, the ones with no ancestors, these are brand new tables flushed to disk while the audit was running. Many of these are created with minimum timestamps that are on the order of a year in the past. Obviously this propagates to every file these are merged into, leaving us with exactly one date tier in practice. Since DTCS does STCS within tiers, we are for all intents and purposes, running STCS.

I don't believe that we are overwriting with timestamps anymore, so I have to assume this is the result of read repair.

And, as an aside, it's somewhat surprising that that this happens so regularly, I was under the impression that we had effectively purged all data prior to about December 2015. It would seem we still have records going all the way back to when this cluster was first stood up.

NOTE: I will open a separate issue to investigate the issue of older data

Some possible courses of action

It's probably worth verifying the source of these out-of-order writes, and investigating what it would take to actually utilize the date-based tiering optimization. If it's entirely read-repair, disabling read-repair is one option (and is in fact recommended for DTCS), but I wouldn't be comfortable doing so until we are running regular anti-entropy repairs. Running regular repairs might be problematic to do incrementally though, since it will split the compaction pool, and double the SSTables/read, (which are already higher than we'd like as a result of a our def-facto STCS use).

One option might be to accept the status quo. If instance sizes are kept low, then compaction could be made reasonably aggressive, resulting in fewer tables (lower SSTables/read), that aren't overly excessive in size. If we do this, I think I'd rather see us switch to STCS explicitly.

Another option would be to explore TWCS (CASSANDRA-9666). From what I can gather, quite a few users of DTCS have been switching to TWCS of late. It would be reasonably easy to test this by spinning up a new instance temporarily in write survey mode, and overriding the compaction strategy locally.

NOTE: I will make a point of cleaning up the audit tooling I used above, and posting that somewhere; It will be useful in future testing.
Eevans added a comment.Apr 6 2016, 8:17 PM

/cc'ing @JAllemandou and @elukey as AQS uses DTCS too if I'm not mistaken; It wouldn't hurt to have a look at how compaction is working on the AQS cluster

See also T140008: High RESTBase storage utilization, in particular the comments starting here.

TL;DR In addition to all of the issues outlined above, the layout created by DTCS is resulting in a high degree of overlapping data that in turn is making it difficult to GC obsolete records (deletes, overwrites, TTL-expired columns).

Eevans renamed this task from Efficacy of DateTieredCompactionStrategy to Evaluate efficacy of DateTieredCompactionStrategy.Sep 20 2016, 7:58 PM
Eevans lowered the priority of this task from High to Medium.
GWicke closed this task as Resolved.Jul 11 2017, 8:29 PM