Change Details

(WARNING) Work-in-progress > Lets revisit Scylla once DTCS and encryption have landed. http://www.scylladb.com/technology/status/ is a good place to track progress. == Questions == === Would Syclla allow us to return to leveled compaction? === Leveled compaction provided us the best latency by tightly bounding SSTables/read. However, high compaction throughput was a problem, as was an issue that only ever manifested under LCS which prevented bootstrapping (stream timeouts). Since Syclla partitions by core, each instance of compaction will be operating on a smaller subset of data. Would this, combined with the promise of more efficient CPU and IO utilization make LCS tractable for us again? NOTE: This also ultimately depends on the efficacy of tombstone GC under LCS. === Can Syclla offer any respite from our troubles with tombstone GC? === Our data models make use of partitions of unbounded width. Additionally, our revision retention policies create significant numbers of tombstones distributed evenly throughout. Over time these partitions become distributed across all SSTables, creating an overlap that prevents compactions from garbage-collecting tombstoned data. Out-of-order writes aggravate this problem by preventing the per-file maximum purge-able tombstone from being used to rule-out such overlap, so having the capacity to repair (for example), might be one way in which Syclla could improve the situation. Having compaction applied to a smaller subset of data may also contribute to less overlap, and permit for more aggressive compaction (to fewer files). === Would repairs of our entire dataset by tractable under Syclla? === Complete anti-entropy repairs are expensive. They require significant CPU and disk IO for validation and anti-compaction. Inefficiencies become compounded as the amount of data under repair increases (write amplification, over-streaming, etc). === Would Syclla's handling of very wide partitions be an improvement over Cassandra's? === Our data models make use of partitions of unbounded width. We have seen high frequency edits of documents create 30G+ of history in a relatively short period (the cluster has been up less than 2 years, with several rounds of culling and targeted partition removals). These wide partitions create a number of problems, not least of which are when a queries attempt to collate too many results in memory, and the allocations result in OOM exceptions. Data models like this are considered an anti-pattern in Cassandra, but Scylla's marketing would seem to claim that the improved performance can be enough to brute-force your way past this: > Raw speed isn’t just a nice benchmark or a low cloud bill. Scylla’s order of magnitude improvement in performance opens a wide range of possibilities. Instead of designing a complex data model to achieve adequate performance, use a straightforward data model, eliminate complexity, and finish your NoSQL project in less time with fewer bugs. > > Performance improvements enable not just reduction in hardware resources, but also a reduction in architecture, debugging, and devops labor. Scylla combines the simple and familiar Cassandra architecture with new power to build a data model that fits the application, not tweak the model to fit the server. If true, this could save us the work of remodeling RESTBase storage. == Missing in ScyllaDB (as of v1.4) == * Incremental repair * Time-window compaction * Debian support (https://bugs.debian.org/824509) NOTE: This above list is not meant to be comprehensive; Contains only those items which might be relevant to our environment. == Testing == === Environment / Requirements === * 16G or 2G/lcore RAM (whichever is higher) * SSDs (RAID0) * XFS * 10 Gbps networking preferred (NOTE) See http://www.scylladb.com/doc/admin/#system-requirements === Experiments === The individual experiments that follow are meant to answer the questions above, and vet any theories and assumptions we may have. Unless stated otherwise, a constant for each is a workload that matches that of the RESTBase production environment. That is, both test data and the rate and disposition of requests should be proportional to what we see in production. This will require some work, since prior test methodologies haven't included much beyond [[ https://github.com/wikimedia/htmldumper | performing html dumps ]]. ==== Leveled compaction ==== (IMPORTANT) TODO ==== Tombstone GC ==== (IMPORTANT) TODO ==== Repair ==== (IMPORTANT) TODO ==== Wide partitions ==== (IMPORTANT) TODO

(WARNING) Work-in-progress > Lets revisit Scylla once DTCS and encryption have landed. http://www.scylladb.com/technology/status/ is a good place to track progress. == Questions == === Would Syclla allow us to return to leveled compaction? === Leveled compaction provided us the best latency by tightly bounding SSTables/read. However, high compaction throughput was a problem, as was an issue that only ever manifested under LCS which prevented bootstrapping (stream timeouts). Since Syclla partitions by core, each instance of compaction will be operating on a smaller subset of data. Would this, combined with the promise of more efficient CPU and IO utilization make LCS tractable for us again? NOTE: This also ultimately depends on the efficacy of tombstone GC under LCS. === Can Syclla offer any respite from our troubles with tombstone GC? === Our data models make use of partitions of unbounded width. Additionally, our revision retention policies create significant numbers of tombstones distributed evenly throughout. Over time these partitions become distributed across all SSTables, creating an overlap that prevents compactions from garbage-collecting tombstoned data. Out-of-order writes aggravate this problem by preventing the per-file maximum purge-able tombstone from being used to rule-out such overlap, so having the capacity to repair (for example), might be one way in which Syclla could improve the situation. Having compaction applied to a smaller subset of data may also contribute to less overlap, and permit for more aggressive compaction (to fewer files). === Would repairs of our entire dataset by tractable under Syclla? === Complete anti-entropy repairs are expensive. They require significant CPU and disk IO for validation and anti-compaction. Inefficiencies become compounded as the amount of data under repair increases (write amplification, over-streaming, etc). === Would Syclla's handling of very wide partitions be an improvement over Cassandra's? === Our data models make use of partitions of unbounded width. We have seen high frequency edits of documents create 30G+ of history in a relatively short period (the cluster has been up less than 2 years, with several rounds of culling and targeted partition removals). These wide partitions create a number of problems, not least of which are when a queries attempt to collate too many results in memory, and the allocations result in OOM exceptions. Data models like this are considered an anti-pattern in Cassandra, but Scylla's marketing would seem to claim that the improved performance can be enough to brute-force your way past this: > Raw speed isn’t just a nice benchmark or a low cloud bill. Scylla’s order of magnitude improvement in performance opens a wide range of possibilities. Instead of designing a complex data model to achieve adequate performance, use a straightforward data model, eliminate complexity, and finish your NoSQL project in less time with fewer bugs. > > Performance improvements enable not just reduction in hardware resources, but also a reduction in architecture, debugging, and devops labor. Scylla combines the simple and familiar Cassandra architecture with new power to build a data model that fits the application, not tweak the model to fit the server. If true, this could save us the work of remodeling RESTBase storage. == Missing in ScyllaDB (as of v1.4) == * Incremental repair * Time-window compaction * Debian support (https://bugs.debian.org/824509) NOTE: This above list is not meant to be comprehensive; Contains only those items which might be relevant to our environment. == Testing == === Environment / Requirements === * 16G or 2G/lcore RAM (whichever is higher) * SSDs (RAID0) * XFS * 10 Gbps networking preferred (NOTE) See http://www.scylladb.com/doc/admin/#system-requirements === Experiments === The individual experiments that follow are meant to answer the questions above, and vet any theories and assumptions we may have. Unless stated otherwise, a constant for each is a workload that matches that of the RESTBase production environment. That is, both test data and the rate and disposition of requests should be proportional to what we see in production. This will require some work, since prior test methodologies haven't included much beyond [[ https://github.com/wikimedia/htmldumper | performing html dumps ]]. ==== Leveled compaction ==== (IMPORTANT) TODO ==== Tombstone GC ==== (IMPORTANT) TODO ==== Repair ==== (IMPORTANT) TODO ==== Wide partitions ==== # Import the complete history of some very wide partitions # Query for latest N results # Page over entire partitions NOTE: Ingestion should incorporate an apropos revision retention algorithm so that TTL-updated records are interleaved into the test data, in the same manner that they would be in production. (WARNING) **QUESTION:** Are we able to reproduce (re)render history? ===== Metrics ===== * Estimated droppable tombstone ratio * Memory consumption * Query latency * SSTables/read Examples of exceptionally wide partitions: * `local_group_wikipedia_T_parsoid_html/data:it.wikipedia.org:Utente\:Biobot/log` (observed as high as 30G) * `local_group_wikimedia_T_parsoid_html/data:commons.wikimedia.org:User\:Oxyman/Buildings_in_London/2014_October_11-20` * `local_group_wikipedia_T_parsoid_html/data:sv.wikipedia.org:Användare\:Lsjbot/Anomalier-PRIVAT`

(WARNING) Work-in-progress > Lets revisit Scylla once DTCS and encryption have landed. http://www.scylladb.com/technology/status/ is a good place to track progress. == Questions == === Would Syclla allow us to return to leveled compaction? === Leveled compaction provided us the best latency by tightly bounding SSTables/read. However, high compaction throughput was a problem, as was an issue that only ever manifested under LCS which prevented bootstrapping (stream timeouts). Since Syclla partitions by core, each instance of compaction will be operating on a smaller subset of data. Would this, combined with the promise of more efficient CPU and IO utilization make LCS tractable for us again? NOTE: This also ultimately depends on the efficacy of tombstone GC under LCS. === Can Syclla offer any respite from our troubles with tombstone GC? === Our data models make use of partitions of unbounded width. Additionally, our revision retention policies create significant numbers of tombstones distributed evenly throughout. Over time these partitions become distributed across all SSTables, creating an overlap that prevents compactions from garbage-collecting tombstoned data. Out-of-order writes aggravate this problem by preventing the per-file maximum purge-able tombstone from being used to rule-out such overlap, so having the capacity to repair (for example), might be one way in which Syclla could improve the situation. Having compaction applied to a smaller subset of data may also contribute to less overlap, and permit for more aggressive compaction (to fewer files). === Would repairs of our entire dataset by tractable under Syclla? === Complete anti-entropy repairs are expensive. They require significant CPU and disk IO for validation and anti-compaction. Inefficiencies become compounded as the amount of data under repair increases (write amplification, over-streaming, etc). === Would Syclla's handling of very wide partitions be an improvement over Cassandra's? === Our data models make use of partitions of unbounded width. We have seen high frequency edits of documents create 30G+ of history in a relatively short period (the cluster has been up less than 2 years, with several rounds of culling and targeted partition removals). These wide partitions create a number of problems, not least of which are when a queries attempt to collate too many results in memory, and the allocations result in OOM exceptions. Data models like this are considered an anti-pattern in Cassandra, but Scylla's marketing would seem to claim that the improved performance can be enough to brute-force your way past this: > Raw speed isn’t just a nice benchmark or a low cloud bill. Scylla’s order of magnitude improvement in performance opens a wide range of possibilities. Instead of designing a complex data model to achieve adequate performance, use a straightforward data model, eliminate complexity, and finish your NoSQL project in less time with fewer bugs. > > Performance improvements enable not just reduction in hardware resources, but also a reduction in architecture, debugging, and devops labor. Scylla combines the simple and familiar Cassandra architecture with new power to build a data model that fits the application, not tweak the model to fit the server. If true, this could save us the work of remodeling RESTBase storage. == Missing in ScyllaDB (as of v1.4) == * Incremental repair * Time-window compaction * Debian support (https://bugs.debian.org/824509) NOTE: This above list is not meant to be comprehensive; Contains only those items which might be relevant to our environment. == Testing == === Environment / Requirements === * 16G or 2G/lcore RAM (whichever is higher) * SSDs (RAID0) * XFS * 10 Gbps networking preferred (NOTE) See http://www.scylladb.com/doc/admin/#system-requirements === Experiments === The individual experiments that follow are meant to answer the questions above, and vet any theories and assumptions we may have. Unless stated otherwise, a constant for each is a workload that matches that of the RESTBase production environment. That is, both test data and the rate and disposition of requests should be proportional to what we see in production. This will require some work, since prior test methodologies haven't included much beyond [[ https://github.com/wikimedia/htmldumper | performing html dumps ]]. ==== Leveled compaction ==== (IMPORTANT) TODO ==== Tombstone GC ==== (IMPORTANT) TODO ==== Repair ==== (IMPORTANT) TODO ==== Wide partitions ==== # Import the complete history of some very wide partitions (IMPORTANT) TODO# Query for latest N results # Page over entire partitions NOTE: Ingestion should incorporate an apropos revision retention algorithm so that TTL-updated records are interleaved into the test data, in the same manner that they would be in production. (WARNING) **QUESTION:** Are we able to reproduce (re)render history? ===== Metrics ===== * Estimated droppable tombstone ratio * Memory consumption * Query latency * SSTables/read Examples of exceptionally wide partitions: * `local_group_wikipedia_T_parsoid_html/data:it.wikipedia.org:Utente\:Biobot/log` (observed as high as 30G) * `local_group_wikimedia_T_parsoid_html/data:commons.wikimedia.org:User\:Oxyman/Buildings_in_London/2014_October_11-20` * `local_group_wikipedia_T_parsoid_html/data:sv.wikipedia.org:Användare\:Lsjbot/Anomalier-PRIVAT`