DateTieredCompactionStrategy (DTCS) is not working as expected, its optimizations are being defeated in our environment(s) by out-of-order writes (see {T126221}, for background). An alternative to DTCS has emerged in the form of [[ https://github.com/jeffjirsa/twcs | TimeWindowCompactionStrategy ]] (TWCS), which eschews tiering in favor of creating fixed windows of time.
Since time-ordered data models are common in our environment(s), I believe TWCS warrants an investigation.
== Status ==
Tables that have been converted to date.
| Conversion date | Tables | |
|---------------------|---------|--|
| 2016-10-12 | local_group_wiktionary_T_parsoid_html.data | [[ https://grafana.wikimedia.org/dashboard/snapshot/3RQD7qLJ6ZDNCD6Urj2Lv5etFrMDmKIr | SSTables/read ]], [[ https://grafana.wikimedia.org/dashboard/snapshot/EwiQVBstxOPw14tQLNeWuplRR3VQ5nyb | SSTable count ]] (large spikes are the result of repair testing) |
| 2016-10-13 | local_group_wikimedia_T_parsoid_html.data | [[ https://grafana.wikimedia.org/dashboard/snapshot/3x7zFqITnyJOkn2ezZB4GLObRmMca1x5 | SSTables/read ]], [[ https://grafana.wikimedia.org/dashboard/snapshot/dwiuDOR1iLfDnu2jf8u03P5O3TdgUBYr | SSTable count ]]|
| 2016-10-19 | local_group_wikipedia_T_parsoid_html.data | [[ https://grafana.wikimedia.org/dashboard/snapshot/kNFqFBZu4PksvvUu1TGV1toGkozxTD5L | SSTables/read ]]. [[ https://grafana.wikimedia.org/dashboard/snapshot/7eI9PnS4yDNO7TRjUB3KcuvG7c8yaBLs | SSTable count ]] |
| 2016-10-27 | local_group_*_T_mobileapps_{lead,remaining}.data | |
| 2016-11-07 | local_group_*_T_title__revisions.{data,idx_by_rev_ever} | |
== Tombstone GC ==
One of the primary hopes for TWCS, was that when combined with [[ https://phabricator.wikimedia.org/T113805 | repair ]], out-of-order writes could be confined to the STCS-compacted current window. For repairs completed within the `compaction_window_size`, overlap could be largely eliminated before the outgoing window's major compaction, and droppable tombstones thus kept to a minimum. That may continue to be an option for some subset of tables, but testing conducted as part of T113805 would suggest it is unlikely that we have the overhead to complete frequent repairs of more than a subset of our data.
However, starting in the comments [[ https://phabricator.wikimedia.org/T133395#2776817 | here ]], are the results of user-defined compaction tests that collapse the N oldest windows, resulting in significant collection of tombstones. User-defined compactions of this kind would be straightforward to script, are not impactful, and have the added benefit of bounding the number of SSTables (and as a result, the SSTables/read).
----
* http://www.slideshare.net/JeffJirsa1/cassandra-summit-2015-real-world-dtcs-for-operators
* {T126221}
* [[ https://issues.apache.org/jira/browse/CASSANDRA-9666 | CASSANDRA-9666 ]]
* https://github.com/jeffjirsa/twcs
* {T113805} (closely related)