Page MenuHomePhabricator

Cassandra compaction throughput rate limiting
Open, LowPublic

Description

The metrics would suggest that the configured compaction throughput limits are not being honored (configured by default for 20MBps, tested as low as 4Mbps, with observed rates as high as 400MBps).

Event Timeline

Eevans created this task.Oct 6 2017, 6:44 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptOct 6 2017, 6:44 PM
Eevans triaged this task as Medium priority.Oct 6 2017, 6:45 PM
Eevans moved this task from Backlog to Next on the User-Eevans board.
Eevans added a subscriber: User-Eevans.
Eevans removed a subscriber: User-Eevans.

An additional data point (that makes matters somewhat worse), is that there are two metrics that one would assume should agree:

org.apache.cassandra.metrics<type=Compaction, name=BytesCompacted><>Count

org.apache.cassandra.metrics<type=Table, keyspace={keyspace}, scope={table}, name=CompactionBytesWritten><>Count

My assumption would be that the latter, summed for all tables, would equal the former, but this is not the case.

Should this discrepancy be followed up with upstream?

Should this discrepancy be followed up with upstream?

If it turns out to be a bug (as opposed to some confusion about what these metrics are), then yes.

An additional data point (that makes matters somewhat worse), is that there are two metrics that one would assume should agree:

org.apache.cassandra.metrics<type=Compaction, name=BytesCompacted><>Count

org.apache.cassandra.metrics<type=Table, keyspace={keyspace}, scope={table}, name=CompactionBytesWritten><>Count

My assumption would be that the latter, summed for all tables, would equal the former, but this is not the case.

Concretely, here are graphs of both; The plot on the left is for Compaction#BytesCompacted, and on the right, Table#CompactionBytesWritten:

You can see some similarity in the pattern of the spikes, but the CompactionBytesWritten metrics are consistently less than BytesCompacted. This is because BytesCompacted is the number of bytes that were run through the compactor, or in other words, the input size. The CompactionBytesWritten table metrics represent the size of the output file(s), which (in our use case at least) is expected to be somewhat less than the input.

TL;DR, They are different because they are (somewhat confusingly) measuring different things.


As for the fact that the throughput seems to exceed what has been configured, I think I may know why that is the case as well.

We are calculating a rate from a total count of bytes written using Prometheus' rate() function. The counter in question is not updated until the entire compaction has completed. A compaction can take quite some time to complete (ironically, more so the more they have been throttled). While the compaction is on-going, the count is not contributing to the calculated rate. When the compaction completes, the entire byte count is added to the next sample.

We could probably ameliorate this by plotting with an aggressive moving average, but it's not clear to me how we do that using Prometheus. @fgiunchedi, any advise here?

Eevans lowered the priority of this task from Medium to Low.Dec 15 2017, 9:13 PM

We could probably ameliorate this by plotting with an aggressive moving average, but it's not clear to me how we do that using Prometheus. @fgiunchedi, any advise here?

I'd try tweaking the interval for rate, using irate, and avg over a longer period (after rate)

Aklapper removed Eevans as the assignee of this task.Jun 19 2020, 4:27 PM

This task has been assigned to the same task owner for more than two years. Resetting task assignee due to inactivity, to decrease task cookie-licking and to get a slightly more realistic overview of plans. Please feel free to assign this task to yourself again if you still realistically work or plan to work on this task - it would be welcome!

For tips how to manage individual work in Phabricator (noisy notifications, lists of task, etc.), see https://phabricator.wikimedia.org/T228575#6237124 for available options.
(For the records, two emails were sent to assignee addresses before resetting assignees. See T228575 for more info and for potential feedback. Thanks!)