Page MenuHomePhabricator

Investigate lack of recency bias in Cassandra histogram metrics
Closed, ResolvedPublic

Assigned To
Authored By
Eevans
Jun 9 2016, 7:42 PM
Referenced Files
F4400926: Screenshot from 2016-08-24 19-46-38.png
Aug 25 2016, 9:39 PM
F4400929: Screenshot from 2016-08-24 19-43-37.png
Aug 25 2016, 9:39 PM
F4400927: Screenshot from 2016-08-24 19-45-02.png
Aug 25 2016, 9:39 PM
F4400930: Screenshot from 2016-08-24 19-46-15.png
Aug 25 2016, 9:39 PM
F4400928: Screenshot from 2016-08-24 19-44-25.png
Aug 25 2016, 9:39 PM
F4161746: Screenshot from 2016-06-13 13-30-07.png
Jun 13 2016, 6:31 PM
F4150582: Screenshot from 2016-06-10 17-29-06.png
Jun 10 2016, 3:32 PM
F4147640: Screenshot from 2016-06-09 21-40-31.png
Jun 9 2016, 7:42 PM

Description

Histogram metrics are lacking the recency bias in 2.2.6 that they had in 2.1.13, resulting in strangely consistent values.

Screenshot from 2016-06-09 21-40-31.png (886×2 px, 174 KB)

See also: https://issues.apache.org/jira/browse/CASSANDRA-11752

Event Timeline

Mentioned in SAL [2016-06-09T19:59:15Z] <urandom> Restarting Cassandra on xenon.eqiad.wmnet (removing patched test build; restoring state) : T137474

Mentioned in SAL [2016-06-10T13:13:00Z] <urandom> Testing patched Cassandra (dpkg -i ...; service cassandra-a restart) on xenon : T137474

Mentioned in SAL [2016-06-10T13:15:03Z] <urandom> Starting html dump(s) in RESTBase staging : T137474

Mentioned in SAL [2016-06-10T13:58:27Z] <urandom> Testing patched Cassandra (dpkg -i ...; service cassandra-a restart) on cerium : T137474

Mentioned in SAL [2016-06-10T13:59:54Z] <urandom> Testing patched Cassandra (dpkg -i ...; service cassandra-a restart) on praseodymim : T137474

Mentioned in SAL [2016-06-10T14:06:51Z] <urandom> Testing patched Cassandra (dpkg -i ...; service cassandra-a restart) on restbase-test2001 : T137474

Mentioned in SAL [2016-06-10T14:17:43Z] <urandom> Testing patched Cassandra (dpkg -i ...; service cassandra-{a,b} restart) on restbase-test200[1-2] : T137474

The root cause here is a deliberate change to the histogram implementation in order to address concerns some had over the lossy nature of the forward-decaying priority reservoir sampling used prior to Cassandra 2.2. Options are still being discussed on CASSANDRA-11752, but consensus seems to be that at a minimum, percentile accessors should be recency biased without requiring a reset-on-read.

Until a resolution to CASSANDRA-11752 is available, I propose we patch our build to make use of the Dropwizard ExponentiallyDecayingReservoir (the implementation used in 2.1). The patch for this is very simple:

diff --git a/src/java/org/apache/cassandra/metrics/CassandraMetricsRegistry.java b/src/java/org/apache/cassandra/metrics/CassandraMetricsRegistry.java
index 6fdb2ff..308a65b 100644
--- a/src/java/org/apache/cassandra/metrics/CassandraMetricsRegistry.java
+++ b/src/java/org/apache/cassandra/metrics/CassandraMetricsRegistry.java
@@ -60,7 +60,7 @@ public class CassandraMetricsRegistry extends MetricRegistry
 
     public Histogram histogram(MetricName name, boolean considerZeroes)
     {
-        Histogram histogram = register(name, new ClearableHistogram(new EstimatedHistogramReservoir(considerZeroes)));
+        Histogram histogram = register(name, new Histogram(new ExponentiallyDecayingReservoir()));
         registerMBean(histogram, name.getMBeanName());
 
         return histogram;
@@ -68,7 +68,7 @@ public class CassandraMetricsRegistry extends MetricRegistry
 
     public Timer timer(MetricName name)
     {
-        Timer timer = register(name, new Timer(new EstimatedHistogramReservoir(false)));
+        Timer timer = register(name, new Timer(new ExponentiallyDecayingReservoir()));
         registerMBean(timer, name.getMBeanName());
 
         return timer;

I have built Debian packages that apply this patch as part of the package build, they can be found here. I've manually installed these packages on the staging cluster, and have a couple of dump processes running.

Screenshot from 2016-06-10 17-29-06.png (710×1 px, 106 KB)

I propose that we keep the dumps running over the weekend, and upgrade restbase1007.eqiad.wmnet on Monday if everything continues to look OK.

I propose that we keep the dumps running over the weekend, and upgrade restbase1007.eqiad.wmnet on Monday if everything continues to look OK.

Hmm, we have around 10% free space left on eqiad-staging nodes (~40 GB). Since the idea is to run the dumps, perhaps it'd be worth truncating the most significant CFs to be in the clear?

I have built Debian packages that apply this patch as part of the package build, they can be found here. I've manually installed these packages on the staging cluster, and have a couple of dump processes running.

Screenshot from 2016-06-10 17-29-06.png (710×1 px, 106 KB)

This looks awesome! Thank you, @Eevans for investigating and coming up with a working solution so quickly!

I propose that we keep the dumps running over the weekend, and upgrade restbase1007.eqiad.wmnet on Monday if everything continues to look OK.

Hmm, we have around 10% free space left on eqiad-staging nodes (~40 GB). Since the idea is to run the dumps, perhaps it'd be worth truncating the most significant CFs to be in the clear?

It's going to be difficult to move the needle by a whole lot without truncating the wikipedia parsoid tables (which we've been reluctant to do so far). Truncating local_group_wikipedia_T_mobileapps_remaining would free up ~23G, and local_group_wikipedia_T_mobileapps_lead another ~8G (combined that's about 10% of the current data); Are we OK truncating the mobileapps tables?

[ ... ]
43165683	data/local_group_wikisource_T_parsoid_html
45114701	data/local_group_phase0_T_parsoid_html
2296759180	data/local_group_wikipedia_T_summary
8347359051	data/local_group_wikipedia_T_title__revisions
8465207914	data/local_group_wikipedia_T_mobileapps_lead
9147278915	data/local_group_wikipedia_T_parsoid_section_offsets
24520085710	data/local_group_wikipedia_T_mobileapps_remaining
76365898022	data/local_group_wikipedia_T_parsoid_dataW4ULtxs1oMqJ
188561917420	data/local_group_wikipedia_T_parsoid_html
318081577290	total

It's going to be difficult to move the needle by a whole lot without truncating the wikipedia parsoid tables (which we've been reluctant to do so far).

I know, but we are coming at a point where we have to take a decision what to do next, otherwise we won't be able to store anything any more (but let's not discuss this here, it's kind of OT for this task).

For the time being keep the dumps running so that we collect as much data as possible. I will monitor the nodes over the week-end and if we come dangerously close to filling the disk I'll stop the dump. Where is it running from?

Truncating local_group_wikipedia_T_mobileapps_remaining would free up ~23G, and local_group_wikipedia_T_mobileapps_lead another ~8G (combined that's about 10% of the current data); Are we OK truncating the mobileapps tables?

Sure, go ahead and do that.

It's going to be difficult to move the needle by a whole lot without truncating the wikipedia parsoid tables (which we've been reluctant to do so far).

I know, but we are coming at a point where we have to take a decision what to do next, otherwise we won't be able to store anything any more (but let's not discuss this here, it's kind of OT for this task).

For the time being keep the dumps running so that we collect as much data as possible. I will monitor the nodes over the week-end and if we come dangerously close to filling the disk I'll stop the dump. Where is it running from?

Thanks @mobrovac ! The dumps are running on xenon and cerium.

eevans@xenon:~$ screen -ls
There is a screen on:
	15544.dump	(06/09/2016 10:20:00 AM)	(Detached)
1 Socket in /var/run/screen/S-eevans.
eevans@cerium:~$ screen -ls
There is a screen on:
	18952.dump	(06/09/2016 10:20:09 AM)	(Detached)
1 Socket in /var/run/screen/S-eevans.

Truncating local_group_wikipedia_T_mobileapps_remaining would free up ~23G, and local_group_wikipedia_T_mobileapps_lead another ~8G (combined that's about 10% of the current data); Are we OK truncating the mobileapps tables?

Sure, go ahead and do that.

Done.

$ df -h
Filesystem                 Size  Used Avail Use% Mounted on
udev                        10M     0   10M   0% /dev
tmpfs                      3.2G  331M  2.8G  11% /run
/dev/md0                    28G  6.3G   20G  24% /
tmpfs                      7.8G     0  7.8G   0% /dev/shm
tmpfs                      5.0M     0  5.0M   0% /run/lock
tmpfs                      7.8G     0  7.8G   0% /sys/fs/cgroup
/dev/mapper/xenon--vg-srv  355G  271G   66G  81% /srv

Dumps ran continuously over the weekend in staging, and the metrics appear reasonable. I'm going to proceed with the upgrade of restbase1007 (the only production node currently running 2.2.6).

Mentioned in SAL [2016-06-13T17:38:00Z] <urandom> Restarting restbase1007-a.eqiad.wmnet : T137474

Mentioned in SAL [2016-06-13T17:52:42Z] <urandom> Restarting restbase1007-b.eqiad.wmnet : T137474

Mentioned in SAL [2016-06-13T17:55:20Z] <urandom> Restarting restbase1007-c.eqiad.wmnet : T137474

This is now deployed on 1007-{a,b,c}, and the metrics are lively once more.

Screenshot from 2016-06-13 13-30-07.png (528×1 px, 259 KB)

Thanks, @Eevans! Is there anything left on this task (upstreaming?), or should we resolve it?

Thanks, @Eevans! Is there anything left on this task (upstreaming?), or should we resolve it?

I was planning to leave it open to keep track of the upstream issue, yes.

A patch for this has been submitted upstream. We should test this out, and provide feedback if necessary, before it becomes a part of the 2.2.8 release.

Mentioned in SAL [2016-08-24T20:03:54Z] <urandom> T137474 Starting htmldumper in RESTBase Staging

Mentioned in SAL [2016-08-24T20:59:02Z] <urandom> T137474: Upgrading xenon.eqiad.wmnet to cassandra_2.2.6-wmf2

Mentioned in SAL [2016-08-25T00:51:14Z] <urandom> T137474: Stopping dumps in RESTBase staging, and reverting xenon.eqiad.wmnet to Cassandra 2.2.6-wmf1

Some explanation:

When the test begins at ~20:00, all 3 nodes are running a version of Cassandra 2.2.6 patched to reinstate the Dropwizard ExponentiallyDecayingReservoir that was used prior to Cassandra 2.2:

1#! /bin/sh /usr/share/dpatch/dpatch-run
2## 100reinstate_exp_decaying_resv.dpatch by Eric Evans <eevans@wikimedia.org>
3##
4## All lines beginning with `## DP:' are a description of the patch.
5## DP: No description.
6
7@DPATCH@
8diff --git a/src/java/org/apache/cassandra/metrics/CassandraMetricsRegistry.java b/src/java/org/apache/cassandra/metrics/CassandraMetricsRegistry.java
9index 6fdb2ff..308a65b 100644
10--- a/src/java/org/apache/cassandra/metrics/CassandraMetricsRegistry.java
11+++ b/src/java/org/apache/cassandra/metrics/CassandraMetricsRegistry.java
12@@ -60,7 +60,7 @@ public class CassandraMetricsRegistry extends MetricRegistry
13
14 public Histogram histogram(MetricName name, boolean considerZeroes)
15 {
16- Histogram histogram = register(name, new ClearableHistogram(new EstimatedHistogramReservoir(considerZeroes)));
17+ Histogram histogram = register(name, new Histogram(new ExponentiallyDecayingReservoir()));
18 registerMBean(histogram, name.getMBeanName());
19
20 return histogram;
21@@ -68,7 +68,7 @@ public class CassandraMetricsRegistry extends MetricRegistry
22
23 public Timer timer(MetricName name)
24 {
25- Timer timer = register(name, new Timer(new EstimatedHistogramReservoir(false)));
26+ Timer timer = register(name, new Timer(new ExponentiallyDecayingReservoir()));
27 registerMBean(timer, name.getMBeanName());
28
29 return timer;

At ~21:00, traffic generation was stopped long enough to upgrade xenon-a to a version of Cassandra 2.2.6 patched to include what was merged as a part of CASSANDRA-11752.

1#! /bin/sh /usr/share/dpatch/dpatch-run
2## 101cassandra-11752_2.2.dpatch by Eric Evans <eevans@wikimedia.org>
3##
4## All lines beginning with `## DP:' are a description of the patch.
5## DP: No description.
6
7@DPATCH@
8diff --git a/src/java/org/apache/cassandra/metrics/CassandraMetricsRegistry.java b/src/java/org/apache/cassandra/metrics/CassandraMetricsRegistry.java
9index 6fdb2ff..8e5671b 100644
10--- a/src/java/org/apache/cassandra/metrics/CassandraMetricsRegistry.java
11+++ b/src/java/org/apache/cassandra/metrics/CassandraMetricsRegistry.java
12@@ -60,7 +60,7 @@ public class CassandraMetricsRegistry extends MetricRegistry
13
14 public Histogram histogram(MetricName name, boolean considerZeroes)
15 {
16- Histogram histogram = register(name, new ClearableHistogram(new EstimatedHistogramReservoir(considerZeroes)));
17+ Histogram histogram = register(name, new ClearableHistogram(new DecayingEstimatedHistogramReservoir(considerZeroes)));
18 registerMBean(histogram, name.getMBeanName());
19
20 return histogram;
21@@ -68,7 +68,7 @@ public class CassandraMetricsRegistry extends MetricRegistry
22
23 public Timer timer(MetricName name)
24 {
25- Timer timer = register(name, new Timer(new EstimatedHistogramReservoir(false)));
26+ Timer timer = register(name, new Timer(new DecayingEstimatedHistogramReservoir()));
27 registerMBean(timer, name.getMBeanName());
28
29 return timer;
30diff --git a/src/java/org/apache/cassandra/metrics/ClearableHistogram.java b/src/java/org/apache/cassandra/metrics/ClearableHistogram.java
31index 85f2fa9..4a081d8 100644
32--- a/src/java/org/apache/cassandra/metrics/ClearableHistogram.java
33+++ b/src/java/org/apache/cassandra/metrics/ClearableHistogram.java
34@@ -26,14 +26,14 @@ import com.codahale.metrics.Histogram;
35 */
36 public class ClearableHistogram extends Histogram
37 {
38- private final EstimatedHistogramReservoir reservoirRef;
39+ private final DecayingEstimatedHistogramReservoir reservoirRef;
40
41 /**
42 * Creates a new {@link com.codahale.metrics.Histogram} with the given reservoir.
43 *
44 * @param reservoir the reservoir to create a histogram from
45 */
46- public ClearableHistogram(EstimatedHistogramReservoir reservoir)
47+ public ClearableHistogram(DecayingEstimatedHistogramReservoir reservoir)
48 {
49 super(reservoir);
50
51diff --git a/src/java/org/apache/cassandra/metrics/DecayingEstimatedHistogramReservoir.java b/src/java/org/apache/cassandra/metrics/DecayingEstimatedHistogramReservoir.java
52new file mode 100644
53index 0000000..14a4366
54--- /dev/null
55+++ b/src/java/org/apache/cassandra/metrics/DecayingEstimatedHistogramReservoir.java
56@@ -0,0 +1,549 @@
57+/*
58+ * Licensed to the Apache Software Foundation (ASF) under one
59+ * or more contributor license agreements. See the NOTICE file
60+ * distributed with this work for additional information
61+ * regarding copyright ownership. The ASF licenses this file
62+ * to you under the Apache License, Version 2.0 (the
63+ * "License"); you may not use this file except in compliance
64+ * with the License. You may obtain a copy of the License at
65+ *
66+ * http://www.apache.org/licenses/LICENSE-2.0
67+ *
68+ * Unless required by applicable law or agreed to in writing, software
69+ * distributed under the License is distributed on an "AS IS" BASIS,
70+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
71+ * See the License for the specific language governing permissions and
72+ * limitations under the License.
73+ */
74+
75+package org.apache.cassandra.metrics;
76+
77+import java.io.OutputStream;
78+import java.io.OutputStreamWriter;
79+import java.io.PrintWriter;
80+import java.nio.charset.Charset;
81+import java.util.Arrays;
82+import java.util.concurrent.atomic.AtomicBoolean;
83+import java.util.concurrent.atomic.AtomicLong;
84+import java.util.concurrent.atomic.AtomicLongArray;
85+import java.util.concurrent.locks.ReentrantReadWriteLock;
86+
87+import com.google.common.annotations.VisibleForTesting;
88+
89+import com.codahale.metrics.Clock;
90+import com.codahale.metrics.Reservoir;
91+import com.codahale.metrics.Snapshot;
92+import org.apache.cassandra.utils.EstimatedHistogram;
93+
94+/**
95+ * A decaying histogram reservoir where values collected during each minute will be twice as significant as the values
96+ * collected in the previous minute. Measured values are collected in variable sized buckets, using small buckets in the
97+ * lower range and larger buckets in the upper range. Use this histogram when you want to know if the distribution of
98+ * the underlying data stream has changed recently and you want high resolution on values in the lower range.
99+ *
100+ * The histogram use forward decay [1] to make recent values more significant. The forward decay factor will be doubled
101+ * every minute (half-life time set to 60 seconds) [2]. The forward decay landmark is reset every 30 minutes (or at
102+ * first read/update after 30 minutes). During landmark reset, updates and reads in the reservoir will be blocked in a
103+ * fashion similar to the one used in the metrics library [3]. The 30 minute rescale interval is used based on the
104+ * assumption that in an extreme case we would have to collect a metric 1M times for a single bucket each second. By the
105+ * end of the 30:th minute all collected values will roughly add up to 1.000.000 * 60 * pow(2, 30) which can be
106+ * represented with 56 bits giving us some head room in a signed 64 bit long.
107+ *
108+ * Internally two reservoirs are maintained, one with decay and one without decay. All public getters in a {@Snapshot}
109+ * will expose the decay functionality with the exception of the {@link Snapshot#getValues()} which will return values
110+ * from the reservoir without decay. This makes it possible for the caller to maintain precise deltas in an interval of
111+ * its choise.
112+ *
113+ * The bucket size starts at 1 and grows by 1.2 each time (rounding and removing duplicates). It goes from 1 to around
114+ * 18T by default (creating 164+1 buckets), which will give a timing resolution from microseconds to roughly 210 days,
115+ * with less precision as the numbers get larger.
116+ *
117+ * The series of values to which the counts in `decayingBuckets` correspond:
118+ * 1, 2, 3, 4, 5, 6, 7, 8, 10, 12, 14, 17, 20, 24, 29, 35, 42, 50, 60, 72 etc.
119+ * Thus, a `decayingBuckets` of [0, 0, 1, 10] would mean we had seen 1 value of 3 and 10 values of 4.
120+ *
121+ * Each bucket represents values from (previous bucket offset, current offset].
122+ *
123+ * [1]: http://dimacs.rutgers.edu/~graham/pubs/papers/fwddecay.pdf
124+ * [2]: https://en.wikipedia.org/wiki/Half-life
125+ * [3]: https://github.com/dropwizard/metrics/blob/v3.1.2/metrics-core/src/main/java/com/codahale/metrics/ExponentiallyDecayingReservoir.java
126+ */
127+public class DecayingEstimatedHistogramReservoir implements Reservoir
128+{
129+ /**
130+ * The default number of decayingBuckets. Use this bucket count to reduce memory allocation for bucket offsets.
131+ */
132+ public static final int DEFAULT_BUCKET_COUNT = 164;
133+ public static final boolean DEFAULT_ZERO_CONSIDERATION = false;
134+
135+ // The offsets used with a default sized bucket array without a separate bucket for zero values.
136+ public static final long[] DEFAULT_WITHOUT_ZERO_BUCKET_OFFSETS = EstimatedHistogram.newOffsets(DEFAULT_BUCKET_COUNT, false);
137+
138+ // The offsets used with a default sized bucket array with a separate bucket for zero values.
139+ public static final long[] DEFAULT_WITH_ZERO_BUCKET_OFFSETS = EstimatedHistogram.newOffsets(DEFAULT_BUCKET_COUNT, true);
140+
141+ // Represents the bucket offset as created by {@link EstimatedHistogram#newOffsets()}
142+ private final long[] bucketOffsets;
143+
144+ // decayingBuckets and buckets are one element longer than bucketOffsets -- the last element is values greater than the last offset
145+ private final AtomicLongArray decayingBuckets;
146+ private final AtomicLongArray buckets;
147+
148+ public static final long HALF_TIME_IN_S = 60L;
149+ public static final double MEAN_LIFETIME_IN_S = HALF_TIME_IN_S / Math.log(2.0);
150+ public static final long LANDMARK_RESET_INTERVAL_IN_MS = 30L * 60L * 1000L;
151+
152+ private final AtomicBoolean rescaling = new AtomicBoolean(false);
153+ private volatile long decayLandmark;
154+
155+ private final ReentrantReadWriteLock lock = new ReentrantReadWriteLock();
156+
157+ // Wrapper around System.nanoTime() to simplify unit testing.
158+ private final Clock clock;
159+
160+
161+ /**
162+ * Construct a decaying histogram with default number of buckets and without considering zeroes.
163+ */
164+ public DecayingEstimatedHistogramReservoir()
165+ {
166+ this(DEFAULT_ZERO_CONSIDERATION, DEFAULT_BUCKET_COUNT, Clock.defaultClock());
167+ }
168+
169+ /**
170+ * Construct a decaying histogram with default number of buckets.
171+ *
172+ * @param considerZeroes when true, 0-value measurements in a separate bucket, otherwise they will be collected in
173+ * same bucket as 1-value measurements
174+ */
175+ public DecayingEstimatedHistogramReservoir(boolean considerZeroes)
176+ {
177+ this(considerZeroes, DEFAULT_BUCKET_COUNT, Clock.defaultClock());
178+ }
179+
180+ /**
181+ * Construct a decaying histogram.
182+ *
183+ * @param considerZeroes when true, 0-value measurements in a separate bucket, otherwise they will be collected in
184+ * same bucket as 1-value measurements
185+ * @param bucketCount number of buckets used to collect measured values
186+ */
187+ public DecayingEstimatedHistogramReservoir(boolean considerZeroes, int bucketCount)
188+ {
189+ this(considerZeroes, bucketCount, Clock.defaultClock());
190+ }
191+
192+ @VisibleForTesting
193+ DecayingEstimatedHistogramReservoir(boolean considerZeroes, int bucketCount, Clock clock)
194+ {
195+ if (bucketCount == DEFAULT_BUCKET_COUNT)
196+ {
197+ if (considerZeroes == true)
198+ {
199+ bucketOffsets = DEFAULT_WITH_ZERO_BUCKET_OFFSETS;
200+ }
201+ else
202+ {
203+ bucketOffsets = DEFAULT_WITHOUT_ZERO_BUCKET_OFFSETS;
204+ }
205+ }
206+ else
207+ {
208+ bucketOffsets = EstimatedHistogram.newOffsets(bucketCount, considerZeroes);
209+ }
210+ decayingBuckets = new AtomicLongArray(bucketOffsets.length + 1);
211+ buckets = new AtomicLongArray(bucketOffsets.length + 1);
212+ this.clock = clock;
213+ decayLandmark = clock.getTime();
214+ }
215+
216+ /**
217+ * Increments the count of the bucket closest to n, rounding UP.
218+ *
219+ * @param value the data point to add to the histogram
220+ */
221+ public void update(long value)
222+ {
223+ long now = clock.getTime();
224+ rescaleIfNeeded(now);
225+
226+ int index = Arrays.binarySearch(bucketOffsets, value);
227+ if (index < 0)
228+ {
229+ // inexact match, take the first bucket higher than n
230+ index = -index - 1;
231+ }
232+ // else exact match; we're good
233+
234+ lockForRegularUsage();
235+
236+ try
237+ {
238+ decayingBuckets.getAndAdd(index, forwardDecayWeight(now));
239+ }
240+ finally
241+ {
242+ unlockForRegularUsage();
243+ }
244+
245+ buckets.getAndIncrement(index);
246+ }
247+
248+ private long forwardDecayWeight(long now)
249+ {
250+ return Math.round(Math.exp(((now - decayLandmark) / 1000L) / MEAN_LIFETIME_IN_S));
251+ }
252+
253+ /**
254+ * Return the number of buckets where recorded values are stored.
255+ *
256+ * This method does not return the number of recorded values as suggested by the {@link Reservoir} interface.
257+ *
258+ * @return the number of buckets
259+ */
260+ public int size()
261+ {
262+ return decayingBuckets.length();
263+ }
264+
265+ /**
266+ * Returns a snapshot of the decaying values in this reservoir.
267+ *
268+ * Non-decaying reservoir will not be included in the snapshot.
269+ *
270+ * @return the snapshot
271+ */
272+ public Snapshot getSnapshot()
273+ {
274+ rescaleIfNeeded();
275+
276+ lockForRegularUsage();
277+
278+ try
279+ {
280+ return new EstimatedHistogramReservoirSnapshot(this);
281+ }
282+ finally
283+ {
284+ unlockForRegularUsage();
285+ }
286+ }
287+
288+ /**
289+ * @return true if this histogram has overflowed -- that is, a value larger than our largest bucket could bound was added
290+ */
291+ @VisibleForTesting
292+ boolean isOverflowed()
293+ {
294+ return decayingBuckets.get(decayingBuckets.length() - 1) > 0;
295+ }
296+
297+ private void rescaleIfNeeded()
298+ {
299+ rescaleIfNeeded(clock.getTime());
300+ }
301+
302+ private void rescaleIfNeeded(long now)
303+ {
304+ if (needRescale(now))
305+ {
306+ if (rescaling.compareAndSet(false, true))
307+ {
308+ try
309+ {
310+ rescale(now);
311+ }
312+ finally
313+ {
314+ rescaling.set(false);
315+ }
316+ }
317+ }
318+ }
319+
320+ private void rescale(long now)
321+ {
322+ // Check again to make sure that another thread didn't complete rescale already
323+ if (needRescale(now))
324+ {
325+ lockForRescale();
326+
327+ try
328+ {
329+ final long rescaleFactor = forwardDecayWeight(now);
330+ decayLandmark = now;
331+
332+ final int bucketCount = decayingBuckets.length();
333+ for (int i = 0; i < bucketCount; i++)
334+ {
335+ long newValue = Math.round((decayingBuckets.get(i) / rescaleFactor));
336+ decayingBuckets.set(i, newValue);
337+ }
338+ }
339+ finally
340+ {
341+ unlockForRescale();
342+ }
343+ }
344+ }
345+
346+ private boolean needRescale(long now)
347+ {
348+ return (now - decayLandmark) > LANDMARK_RESET_INTERVAL_IN_MS;
349+ }
350+
351+ @VisibleForTesting
352+ public void clear()
353+ {
354+ lockForRescale();
355+
356+ try
357+ {
358+ final int bucketCount = decayingBuckets.length();
359+ for (int i = 0; i < bucketCount; i++)
360+ {
361+ decayingBuckets.set(i, 0L);
362+ buckets.set(i, 0L);
363+ }
364+ }
365+ finally
366+ {
367+ unlockForRescale();
368+ }
369+ }
370+
371+ private void lockForRegularUsage()
372+ {
373+ this.lock.readLock().lock();
374+ }
375+
376+ private void unlockForRegularUsage()
377+ {
378+ this.lock.readLock().unlock();
379+ }
380+
381+ private void lockForRescale()
382+ {
383+ this.lock.writeLock().lock();
384+ }
385+
386+ private void unlockForRescale()
387+ {
388+ this.lock.writeLock().unlock();
389+ }
390+
391+
392+ private static final Charset UTF_8 = Charset.forName("UTF-8");
393+
394+ /**
395+ * Represents a snapshot of the decaying histogram.
396+ *
397+ * The decaying buckets are copied into a snapshot array to give a consistent view for all getters. However, the
398+ * copy is made without a write-lock and so other threads may change the buckets while the array is copied,
399+ * probably causign a slight skew up in the quantiles and mean values.
400+ *
401+ * The decaying buckets will be used for quantile calculations and mean values, but the non decaying buckets will be
402+ * exposed for calls to {@link Snapshot#getValues()}.
403+ */
404+ private class EstimatedHistogramReservoirSnapshot extends Snapshot
405+ {
406+ private final long[] decayingBuckets;
407+
408+ public EstimatedHistogramReservoirSnapshot(DecayingEstimatedHistogramReservoir reservoir)
409+ {
410+ final int length = reservoir.decayingBuckets.length();
411+
412+ this.decayingBuckets = new long[length];
413+
414+ for (int i = 0; i < length; i++)
415+ this.decayingBuckets[i] = reservoir.decayingBuckets.get(i);
416+ }
417+
418+ /**
419+ * Get the estimated value at the specified quantile in the distribution.
420+ *
421+ * @param quantile the quantile specified as a value between 0.0 (zero) and 1.0 (one)
422+ * @return estimated value at given quantile
423+ * @throws IllegalStateException in case the histogram overflowed
424+ */
425+ public double getValue(double quantile)
426+ {
427+ assert quantile >= 0 && quantile <= 1.0;
428+
429+ final int lastBucket = decayingBuckets.length - 1;
430+
431+ if (decayingBuckets[lastBucket] > 0)
432+ throw new IllegalStateException("Unable to compute when histogram overflowed");
433+
434+ final long qcount = (long) Math.ceil(count() * quantile);
435+ if (qcount == 0)
436+ return 0;
437+
438+ long elements = 0;
439+ for (int i = 0; i < lastBucket; i++)
440+ {
441+ elements += decayingBuckets[i];
442+ if (elements >= qcount)
443+ return bucketOffsets[i];
444+ }
445+ return 0;
446+ }
447+
448+ /**
449+ * Will return a snapshot of the non-decaying buckets.
450+ *
451+ * The values returned will not be consistent with the quantile and mean values. The caller must be aware of the
452+ * offsets created by {@link EstimatedHistogram#getBucketOffsets()} to make use of the values returned.
453+ *
454+ * @return a snapshot of the non-decaying buckets.
455+ */
456+ public long[] getValues()
457+ {
458+ final int length = buckets.length();
459+
460+ long[] values = new long[length];
461+
462+ for (int i = 0; i < length; i++)
463+ values[i] = buckets.get(i);
464+
465+ return values;
466+ }
467+
468+ /**
469+ * Return the number of buckets where recorded values are stored.
470+ *
471+ * This method does not return the number of recorded values as suggested by the {@link Snapshot} interface.
472+ *
473+ * @return the number of buckets
474+ */
475+ public int size()
476+ {
477+ return decayingBuckets.length;
478+ }
479+
480+ /**
481+ * Return the number of registered values taking forward decay into account.
482+ *
483+ * @return the sum of all bucket values
484+ */
485+ private long count()
486+ {
487+ long sum = 0L;
488+ for (int i = 0; i < decayingBuckets.length; i++)
489+ sum += decayingBuckets[i];
490+ return sum;
491+ }
492+
493+ /**
494+ * Get the estimated max-value that could have been added to this reservoir.
495+ *
496+ * As values are collected in variable sized buckets, the actual max value recored in the reservoir may be less
497+ * than the value returned.
498+ *
499+ * @return the largest value that could have been added to this reservoir, or Long.MAX_VALUE if the reservoir
500+ * overflowed
501+ */
502+ public long getMax()
503+ {
504+ final int lastBucket = decayingBuckets.length - 1;
505+
506+ if (decayingBuckets[lastBucket] > 0)
507+ return Long.MAX_VALUE;
508+
509+ for (int i = lastBucket - 1; i >= 0; i--)
510+ {
511+ if (decayingBuckets[i] > 0)
512+ return bucketOffsets[i];
513+ }
514+ return 0;
515+ }
516+
517+ /**
518+ * Get the estimated mean value in the distribution.
519+ *
520+ * @return the mean histogram value (average of bucket offsets, weighted by count)
521+ * @throws IllegalStateException if any values were greater than the largest bucket threshold
522+ */
523+ public double getMean()
524+ {
525+ final int lastBucket = decayingBuckets.length - 1;
526+
527+ if (decayingBuckets[lastBucket] > 0)
528+ throw new IllegalStateException("Unable to compute when histogram overflowed");
529+
530+ long elements = 0;
531+ long sum = 0;
532+ for (int i = 0; i < lastBucket; i++)
533+ {
534+ long bCount = decayingBuckets[i];
535+ elements += bCount;
536+ sum += bCount * bucketOffsets[i];
537+ }
538+
539+ return (double) sum / elements;
540+ }
541+
542+ /**
543+ * Get the estimated min-value that could have been added to this reservoir.
544+ *
545+ * As values are collected in variable sized buckets, the actual min value recored in the reservoir may be
546+ * higher than the value returned.
547+ *
548+ * @return the smallest value that could have been added to this reservoir
549+ */
550+ public long getMin()
551+ {
552+ for (int i = 0; i < decayingBuckets.length; i++)
553+ {
554+ if (decayingBuckets[i] > 0)
555+ return i == 0 ? 0 : 1 + bucketOffsets[i - 1];
556+ }
557+ return 0;
558+ }
559+
560+ /**
561+ * Get the estimated standard deviation of the values added to this reservoir.
562+ *
563+ * As values are collected in variable sized buckets, the actual deviation may be more or less than the value
564+ * returned.
565+ *
566+ * @return an estimate of the standard deviation
567+ */
568+ public double getStdDev()
569+ {
570+ final int lastBucket = decayingBuckets.length - 1;
571+
572+ if (decayingBuckets[lastBucket] > 0)
573+ throw new IllegalStateException("Unable to compute when histogram overflowed");
574+
575+ final long count = count();
576+
577+ if(count <= 1) {
578+ return 0.0D;
579+ } else {
580+ double mean = this.getMean();
581+ double sum = 0.0D;
582+
583+ for(int i = 0; i < lastBucket; ++i) {
584+ long value = bucketOffsets[i];
585+ double diff = (double)value - mean;
586+ sum += diff * diff * decayingBuckets[i];
587+ }
588+
589+ return Math.sqrt(sum / (double)(count - 1));
590+ }
591+ }
592+
593+ public void dump(OutputStream output)
594+ {
595+ try (PrintWriter out = new PrintWriter(new OutputStreamWriter(output, UTF_8)))
596+ {
597+ int length = decayingBuckets.length;
598+
599+ for(int i = 0; i < length; ++i) {
600+ out.printf("%d%n", decayingBuckets[i]);
601+ }
602+ }
603+ }
604+ }
605+}
606diff --git a/src/java/org/apache/cassandra/metrics/EstimatedHistogramReservoir.java b/src/java/org/apache/cassandra/metrics/EstimatedHistogramReservoir.java
607deleted file mode 100644
608index 29baad8..0000000
609--- a/src/java/org/apache/cassandra/metrics/EstimatedHistogramReservoir.java
610+++ /dev/null
611@@ -1,111 +0,0 @@
612-/*
613- * Licensed to the Apache Software Foundation (ASF) under one
614- * or more contributor license agreements. See the NOTICE file
615- * distributed with this work for additional information
616- * regarding copyright ownership. The ASF licenses this file
617- * to you under the Apache License, Version 2.0 (the
618- * "License"); you may not use this file except in compliance
619- * with the License. You may obtain a copy of the License at
620- *
621- * http://www.apache.org/licenses/LICENSE-2.0
622- *
623- * Unless required by applicable law or agreed to in writing, software
624- * distributed under the License is distributed on an "AS IS" BASIS,
625- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
626- * See the License for the specific language governing permissions and
627- * limitations under the License.
628- */
629-package org.apache.cassandra.metrics;
630-
631-import com.google.common.annotations.VisibleForTesting;
632-
633-import com.codahale.metrics.Reservoir;
634-import com.codahale.metrics.Snapshot;
635-import com.codahale.metrics.UniformSnapshot;
636-import org.apache.cassandra.utils.EstimatedHistogram;
637-
638-/**
639- * Allows our Histogram implementation to be used by the metrics library.
640- *
641- * Default buckets allows nanosecond timings.
642- */
643-public class EstimatedHistogramReservoir implements Reservoir
644-{
645- EstimatedHistogram histogram;
646-
647- // Default to >4 hours of in nanoseconds of buckets
648- public EstimatedHistogramReservoir(boolean considerZeroes)
649- {
650- this(164, considerZeroes);
651- }
652-
653- public EstimatedHistogramReservoir(int numBuckets, boolean considerZeroes)
654- {
655- histogram = new EstimatedHistogram(numBuckets, considerZeroes);
656- }
657-
658- @Override
659- public int size()
660- {
661- return histogram.getBucketOffsets().length + 1;
662- }
663-
664- @Override
665- public void update(long value)
666- {
667- histogram.add(value);
668- }
669-
670- @Override
671- public Snapshot getSnapshot()
672- {
673- return new HistogramSnapshot(histogram);
674- }
675-
676- @VisibleForTesting
677- public void clear()
678- {
679- histogram.getBuckets(true);
680- }
681-
682- static class HistogramSnapshot extends UniformSnapshot
683- {
684- EstimatedHistogram histogram;
685-
686- public HistogramSnapshot(EstimatedHistogram histogram)
687- {
688- super(histogram.getBuckets(false));
689-
690- this.histogram = histogram;
691- }
692-
693- @Override
694- public double getValue(double quantile)
695- {
696- return histogram.percentile(quantile);
697- }
698-
699- @Override
700- public long getMax()
701- {
702- return histogram.max();
703- }
704-
705- @Override
706- public long getMin()
707- {
708- return histogram.min();
709- }
710-
711- @Override
712- public double getMean()
713- {
714- return histogram.rawMean();
715- }
716-
717- @Override
718- public long[] getValues() {
719- return histogram.getBuckets(false);
720- }
721- }
722-}
723diff --git a/src/java/org/apache/cassandra/utils/EstimatedHistogram.java b/src/java/org/apache/cassandra/utils/EstimatedHistogram.java
724index 36048fb..1a48039 100644
725--- a/src/java/org/apache/cassandra/utils/EstimatedHistogram.java
726+++ b/src/java/org/apache/cassandra/utils/EstimatedHistogram.java
727@@ -85,7 +85,7 @@ public class EstimatedHistogram
728 buckets = new AtomicLongArray(bucketData);
729 }
730
731- private static long[] newOffsets(int size, boolean considerZeroes)
732+ public static long[] newOffsets(int size, boolean considerZeroes)
733 {
734 long[] result = new long[size + (considerZeroes ? 1 : 0)];
735 int i = 0;
736diff --git a/test/unit/org/apache/cassandra/metrics/DecayingEstimatedHistogramReservoirTest.java b/test/unit/org/apache/cassandra/metrics/DecayingEstimatedHistogramReservoirTest.java
737new file mode 100644
738index 0000000..f2d817f
739--- /dev/null
740+++ b/test/unit/org/apache/cassandra/metrics/DecayingEstimatedHistogramReservoirTest.java
741@@ -0,0 +1,381 @@
742+/*
743+ * Licensed to the Apache Software Foundation (ASF) under one
744+ * or more contributor license agreements. See the NOTICE file
745+ * distributed with this work for additional information
746+ * regarding copyright ownership. The ASF licenses this file
747+ * to you under the Apache License, Version 2.0 (the
748+ * "License"); you may not use this file except in compliance
749+ * with the License. You may obtain a copy of the License at
750+ *
751+ * http://www.apache.org/licenses/LICENSE-2.0
752+ *
753+ * Unless required by applicable law or agreed to in writing, software
754+ * distributed under the License is distributed on an "AS IS" BASIS,
755+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
756+ * See the License for the specific language governing permissions and
757+ * limitations under the License.
758+ */
759+
760+package org.apache.cassandra.metrics;
761+
762+import org.junit.Test;
763+
764+import com.codahale.metrics.Clock;
765+import com.codahale.metrics.Snapshot;
766+
767+import static org.junit.Assert.assertEquals;
768+import static org.junit.Assert.assertFalse;
769+import static org.junit.Assert.assertTrue;
770+
771+
772+public class DecayingEstimatedHistogramReservoirTest
773+{
774+ private static final double DOUBLE_ASSERT_DELTA = 0;
775+
776+ @Test
777+ public void testSimple()
778+ {
779+ {
780+ // 0 and 1 map to the same, first bucket
781+ DecayingEstimatedHistogramReservoir histogram = new DecayingEstimatedHistogramReservoir();
782+ histogram.update(0);
783+ assertEquals(1, histogram.getSnapshot().getValues()[0]);
784+ histogram.update(1);
785+ assertEquals(2, histogram.getSnapshot().getValues()[0]);
786+ }
787+ {
788+ // 0 and 1 map to different buckets
789+ DecayingEstimatedHistogramReservoir histogram = new DecayingEstimatedHistogramReservoir(true, DecayingEstimatedHistogramReservoir.DEFAULT_BUCKET_COUNT);
790+ histogram.update(0);
791+ assertEquals(1, histogram.getSnapshot().getValues()[0]);
792+ histogram.update(1);
793+ Snapshot snapshot = histogram.getSnapshot();
794+ assertEquals(1, snapshot.getValues()[0]);
795+ assertEquals(1, snapshot.getValues()[1]);
796+ }
797+ }
798+
799+ @Test
800+ public void testOverflow()
801+ {
802+ DecayingEstimatedHistogramReservoir histogram = new DecayingEstimatedHistogramReservoir(DecayingEstimatedHistogramReservoir.DEFAULT_ZERO_CONSIDERATION, 1);
803+ histogram.update(100);
804+ assert histogram.isOverflowed();
805+ assertEquals(Long.MAX_VALUE, histogram.getSnapshot().getMax());
806+ }
807+
808+ @Test
809+ public void testMinMax()
810+ {
811+ DecayingEstimatedHistogramReservoir histogram = new DecayingEstimatedHistogramReservoir();
812+ histogram.update(16);
813+ Snapshot snapshot = histogram.getSnapshot();
814+ assertEquals(15, snapshot.getMin());
815+ assertEquals(17, snapshot.getMax());
816+ }
817+
818+ @Test
819+ public void testMean()
820+ {
821+ {
822+ TestClock clock = new TestClock();
823+
824+ DecayingEstimatedHistogramReservoir histogram = new DecayingEstimatedHistogramReservoir(DecayingEstimatedHistogramReservoir.DEFAULT_ZERO_CONSIDERATION, DecayingEstimatedHistogramReservoir.DEFAULT_BUCKET_COUNT, clock);
825+ for (int i = 0; i < 40; i++)
826+ histogram.update(0);
827+ for (int i = 0; i < 20; i++)
828+ histogram.update(1);
829+ for (int i = 0; i < 10; i++)
830+ histogram.update(2);
831+ assertEquals(1.14D, histogram.getSnapshot().getMean(), 0.1D);
832+ }
833+ {
834+ TestClock clock = new TestClock();
835+
836+ DecayingEstimatedHistogramReservoir histogram = new DecayingEstimatedHistogramReservoir(true, DecayingEstimatedHistogramReservoir.DEFAULT_BUCKET_COUNT, clock);
837+ for (int i = 0; i < 40; i++)
838+ histogram.update(0);
839+ for (int i = 0; i < 20; i++)
840+ histogram.update(1);
841+ for (int i = 0; i < 10; i++)
842+ histogram.update(2);
843+ assertEquals(0.57D, histogram.getSnapshot().getMean(), 0.1D);
844+ }
845+ }
846+
847+ @Test
848+ public void testStdDev()
849+ {
850+ {
851+ TestClock clock = new TestClock();
852+
853+ DecayingEstimatedHistogramReservoir histogram = new DecayingEstimatedHistogramReservoir(DecayingEstimatedHistogramReservoir.DEFAULT_ZERO_CONSIDERATION, DecayingEstimatedHistogramReservoir.DEFAULT_BUCKET_COUNT, clock);
854+ for (int i = 0; i < 20; i++)
855+ histogram.update(10);
856+ for (int i = 0; i < 40; i++)
857+ histogram.update(20);
858+ for (int i = 0; i < 20; i++)
859+ histogram.update(30);
860+
861+ Snapshot snapshot = histogram.getSnapshot();
862+ assertEquals(20.0D, snapshot.getMean(), 2.0D);
863+ assertEquals(7.07D, snapshot.getStdDev(), 2.0D);
864+ }
865+ }
866+
867+ @Test
868+ public void testFindingCorrectBuckets()
869+ {
870+ TestClock clock = new TestClock();
871+
872+ DecayingEstimatedHistogramReservoir histogram = new DecayingEstimatedHistogramReservoir(DecayingEstimatedHistogramReservoir.DEFAULT_ZERO_CONSIDERATION, 90, clock);
873+ histogram.update(23282687);
874+ assertFalse(histogram.isOverflowed());
875+ assertEquals(1, histogram.getSnapshot().getValues()[89]);
876+
877+ histogram.update(9);
878+ assertEquals(1, histogram.getSnapshot().getValues()[8]);
879+
880+ histogram.update(21);
881+ histogram.update(22);
882+ Snapshot snapshot = histogram.getSnapshot();
883+ assertEquals(2, snapshot.getValues()[13]);
884+ assertEquals(6277304.5D, snapshot.getMean(), DOUBLE_ASSERT_DELTA);
885+ }
886+
887+ @Test
888+ public void testPercentile()
889+ {
890+ {
891+ TestClock clock = new TestClock();
892+
893+ DecayingEstimatedHistogramReservoir histogram = new DecayingEstimatedHistogramReservoir(DecayingEstimatedHistogramReservoir.DEFAULT_ZERO_CONSIDERATION, DecayingEstimatedHistogramReservoir.DEFAULT_BUCKET_COUNT, clock);
894+ // percentile of empty histogram is 0
895+ assertEquals(0D, histogram.getSnapshot().getValue(0.99), DOUBLE_ASSERT_DELTA);
896+
897+ histogram.update(1);
898+ // percentile of a histogram with one element should be that element
899+ assertEquals(1D, histogram.getSnapshot().getValue(0.99), DOUBLE_ASSERT_DELTA);
900+
901+ histogram.update(10);
902+ assertEquals(10D, histogram.getSnapshot().getValue(0.99), DOUBLE_ASSERT_DELTA);
903+ }
904+
905+ {
906+ TestClock clock = new TestClock();
907+
908+ DecayingEstimatedHistogramReservoir histogram = new DecayingEstimatedHistogramReservoir(DecayingEstimatedHistogramReservoir.DEFAULT_ZERO_CONSIDERATION, DecayingEstimatedHistogramReservoir.DEFAULT_BUCKET_COUNT, clock);
909+
910+ histogram.update(1);
911+ histogram.update(2);
912+ histogram.update(3);
913+ histogram.update(4);
914+ histogram.update(5);
915+
916+ Snapshot snapshot = histogram.getSnapshot();
917+ assertEquals(0, snapshot.getValue(0.00), DOUBLE_ASSERT_DELTA);
918+ assertEquals(3, snapshot.getValue(0.50), DOUBLE_ASSERT_DELTA);
919+ assertEquals(3, snapshot.getValue(0.60), DOUBLE_ASSERT_DELTA);
920+ assertEquals(5, snapshot.getValue(1.00), DOUBLE_ASSERT_DELTA);
921+ }
922+
923+ {
924+ TestClock clock = new TestClock();
925+
926+ DecayingEstimatedHistogramReservoir histogram = new DecayingEstimatedHistogramReservoir(DecayingEstimatedHistogramReservoir.DEFAULT_ZERO_CONSIDERATION, DecayingEstimatedHistogramReservoir.DEFAULT_BUCKET_COUNT, clock);
927+
928+ for (int i = 11; i <= 20; i++)
929+ histogram.update(i);
930+
931+ // Right now the histogram looks like:
932+ // 10 12 14 17 20
933+ // 0 2 2 3 3
934+ // %: 0 20 40 70 100
935+ Snapshot snapshot = histogram.getSnapshot();
936+ assertEquals(12, snapshot.getValue(0.01), DOUBLE_ASSERT_DELTA);
937+ assertEquals(14, snapshot.getValue(0.30), DOUBLE_ASSERT_DELTA);
938+ assertEquals(17, snapshot.getValue(0.50), DOUBLE_ASSERT_DELTA);
939+ assertEquals(17, snapshot.getValue(0.60), DOUBLE_ASSERT_DELTA);
940+ assertEquals(20, snapshot.getValue(0.80), DOUBLE_ASSERT_DELTA);
941+ }
942+ {
943+ TestClock clock = new TestClock();
944+
945+ DecayingEstimatedHistogramReservoir histogram = new DecayingEstimatedHistogramReservoir(true, DecayingEstimatedHistogramReservoir.DEFAULT_BUCKET_COUNT, clock);
946+ histogram.update(0);
947+ histogram.update(0);
948+ histogram.update(1);
949+
950+ Snapshot snapshot = histogram.getSnapshot();
951+ assertEquals(0, snapshot.getValue(0.5), DOUBLE_ASSERT_DELTA);
952+ assertEquals(1, snapshot.getValue(0.99), DOUBLE_ASSERT_DELTA);
953+ }
954+ }
955+
956+
957+ @Test
958+ public void testDecayingPercentile()
959+ {
960+ {
961+ TestClock clock = new TestClock();
962+
963+ DecayingEstimatedHistogramReservoir histogram = new DecayingEstimatedHistogramReservoir(DecayingEstimatedHistogramReservoir.DEFAULT_ZERO_CONSIDERATION, DecayingEstimatedHistogramReservoir.DEFAULT_BUCKET_COUNT, clock);
964+ // percentile of empty histogram is 0
965+ assertEquals(0, histogram.getSnapshot().getValue(1.0), DOUBLE_ASSERT_DELTA);
966+
967+ for (int v = 1; v <= 100; v++)
968+ {
969+ for (int i = 0; i < 10_000; i++)
970+ {
971+ histogram.update(v);
972+ }
973+ }
974+
975+ Snapshot snapshot = histogram.getSnapshot();
976+ assertEstimatedQuantile(05, snapshot.getValue(0.05));
977+ assertEstimatedQuantile(20, snapshot.getValue(0.20));
978+ assertEstimatedQuantile(40, snapshot.getValue(0.40));
979+ assertEstimatedQuantile(99, snapshot.getValue(0.99));
980+
981+ clock.addSeconds(DecayingEstimatedHistogramReservoir.HALF_TIME_IN_S);
982+ snapshot = histogram.getSnapshot();
983+ assertEstimatedQuantile(05, snapshot.getValue(0.05));
984+ assertEstimatedQuantile(20, snapshot.getValue(0.20));
985+ assertEstimatedQuantile(40, snapshot.getValue(0.40));
986+ assertEstimatedQuantile(99, snapshot.getValue(0.99));
987+
988+ for (int v = 1; v <= 50; v++)
989+ {
990+ for (int i = 0; i < 10_000; i++)
991+ {
992+ histogram.update(v);
993+ }
994+ }
995+
996+ snapshot = histogram.getSnapshot();
997+ assertEstimatedQuantile(04, snapshot.getValue(0.05));
998+ assertEstimatedQuantile(14, snapshot.getValue(0.20));
999+ assertEstimatedQuantile(27, snapshot.getValue(0.40));
1000+ assertEstimatedQuantile(98, snapshot.getValue(0.99));
1001+
1002+ clock.addSeconds(DecayingEstimatedHistogramReservoir.HALF_TIME_IN_S);
1003+ snapshot = histogram.getSnapshot();
1004+ assertEstimatedQuantile(04, snapshot.getValue(0.05));
1005+ assertEstimatedQuantile(14, snapshot.getValue(0.20));
1006+ assertEstimatedQuantile(27, snapshot.getValue(0.40));
1007+ assertEstimatedQuantile(98, snapshot.getValue(0.99));
1008+
1009+ for (int v = 1; v <= 50; v++)
1010+ {
1011+ for (int i = 0; i < 10_000; i++)
1012+ {
1013+ histogram.update(v);
1014+ }
1015+ }
1016+
1017+ snapshot = histogram.getSnapshot();
1018+ assertEstimatedQuantile(03, snapshot.getValue(0.05));
1019+ assertEstimatedQuantile(12, snapshot.getValue(0.20));
1020+ assertEstimatedQuantile(23, snapshot.getValue(0.40));
1021+ assertEstimatedQuantile(96, snapshot.getValue(0.99));
1022+
1023+ clock.addSeconds(DecayingEstimatedHistogramReservoir.HALF_TIME_IN_S);
1024+ snapshot = histogram.getSnapshot();
1025+ assertEstimatedQuantile(03, snapshot.getValue(0.05));
1026+ assertEstimatedQuantile(12, snapshot.getValue(0.20));
1027+ assertEstimatedQuantile(23, snapshot.getValue(0.40));
1028+ assertEstimatedQuantile(96, snapshot.getValue(0.99));
1029+
1030+ for (int v = 11; v <= 20; v++)
1031+ {
1032+ for (int i = 0; i < 5_000; i++)
1033+ {
1034+ histogram.update(v);
1035+ }
1036+ }
1037+
1038+ snapshot = histogram.getSnapshot();
1039+ assertEstimatedQuantile(04, snapshot.getValue(0.05));
1040+ assertEstimatedQuantile(12, snapshot.getValue(0.20));
1041+ assertEstimatedQuantile(20, snapshot.getValue(0.40));
1042+ assertEstimatedQuantile(95, snapshot.getValue(0.99));
1043+
1044+ clock.addSeconds(DecayingEstimatedHistogramReservoir.HALF_TIME_IN_S);
1045+ snapshot = histogram.getSnapshot();
1046+ assertEstimatedQuantile(04, snapshot.getValue(0.05));
1047+ assertEstimatedQuantile(12, snapshot.getValue(0.20));
1048+ assertEstimatedQuantile(20, snapshot.getValue(0.40));
1049+ assertEstimatedQuantile(95, snapshot.getValue(0.99));
1050+
1051+ }
1052+
1053+ {
1054+ TestClock clock = new TestClock();
1055+
1056+ DecayingEstimatedHistogramReservoir histogram = new DecayingEstimatedHistogramReservoir(DecayingEstimatedHistogramReservoir.DEFAULT_ZERO_CONSIDERATION, DecayingEstimatedHistogramReservoir.DEFAULT_BUCKET_COUNT, clock);
1057+ // percentile of empty histogram is 0
1058+ assertEquals(0, histogram.getSnapshot().getValue(0.99), DOUBLE_ASSERT_DELTA);
1059+
1060+ for (int m = 0; m < 40; m++)
1061+ {
1062+ for (int i = 0; i < 1_000_000; i++)
1063+ {
1064+ histogram.update(2);
1065+ }
1066+ // percentile of a histogram with one element should be that element
1067+ clock.addSeconds(DecayingEstimatedHistogramReservoir.HALF_TIME_IN_S);
1068+ assertEquals(2, histogram.getSnapshot().getValue(0.99), DOUBLE_ASSERT_DELTA);
1069+ }
1070+
1071+ clock.addSeconds(DecayingEstimatedHistogramReservoir.HALF_TIME_IN_S * 100);
1072+ assertEquals(0, histogram.getSnapshot().getValue(0.99), DOUBLE_ASSERT_DELTA);
1073+ }
1074+
1075+ {
1076+ TestClock clock = new TestClock();
1077+
1078+ DecayingEstimatedHistogramReservoir histogram = new DecayingEstimatedHistogramReservoir(DecayingEstimatedHistogramReservoir.DEFAULT_ZERO_CONSIDERATION, DecayingEstimatedHistogramReservoir.DEFAULT_BUCKET_COUNT, clock);
1079+
1080+ histogram.update(20);
1081+ histogram.update(21);
1082+ histogram.update(22);
1083+ Snapshot snapshot = histogram.getSnapshot();
1084+ assertEquals(1, snapshot.getValues()[12]);
1085+ assertEquals(2, snapshot.getValues()[13]);
1086+
1087+ clock.addSeconds(DecayingEstimatedHistogramReservoir.HALF_TIME_IN_S);
1088+
1089+ histogram.update(20);
1090+ histogram.update(21);
1091+ histogram.update(22);
1092+ snapshot = histogram.getSnapshot();
1093+ assertEquals(2, snapshot.getValues()[12]);
1094+ assertEquals(4, snapshot.getValues()[13]);
1095+ }
1096+ }
1097+
1098+ private void assertEstimatedQuantile(long expectedValue, double actualValue)
1099+ {
1100+ assertTrue("Expected at least [" + expectedValue + "] but actual is [" + actualValue + "]", actualValue >= expectedValue);
1101+ assertTrue("Expected less than [" + Math.round(expectedValue * 1.2) + "] but actual is [" + actualValue + "]", actualValue < Math.round(expectedValue * 1.2));
1102+ }
1103+
1104+ public class TestClock extends Clock {
1105+ private long tick = 0;
1106+
1107+ public void addSeconds(long seconds)
1108+ {
1109+ tick += seconds * 1_000_000_000L;
1110+ }
1111+
1112+ public long getTick()
1113+ {
1114+ return tick;
1115+ }
1116+
1117+ public long getTime()
1118+ {
1119+ return tick / 1_000_000L;
1120+ };
1121+ }
1122+}

Overview

Screenshot from 2016-08-24 19-43-37.png (727×1 px, 167 KB)

A closer look at the rates

Write rate (org.apache.cassandra.metrics.ColumnFamily.all.WriteLatency.1MinuteRate).

Screenshot from 2016-08-24 19-45-02.png (696×1 px, 113 KB)

Read rate (org.apache.cassandra.metrics.ColumnFamily.all.ReadLatency.1MinuteRate).

Screenshot from 2016-08-24 19-44-25.png (692×1 px, 104 KB)

A closer look at latencies

Write latency (org.apache.cassandra.metrics.ColumnFamily.all.WriteLatency.99percentile)

Screenshot from 2016-08-24 19-46-38.png (696×1 px, 107 KB)

Read latency (org.apache.cassandra.metrics.ColumnFamily.all.ReadLatency.99percentile)

Screenshot from 2016-08-24 19-46-15.png (693×1 px, 119 KB)

Here you can spot what looks like a bit of difference; xenon-a trends generally close to the other two nodes, but the larger spikes would seem to be smoothed out somewhat.

I'm satisfied that this is Good Enough (and others have indicated the same), so I'm closing this issue.