Page MenuHomePhabricator

1000+ keyspace metrics you didn't see coming
Closed, ResolvedPublic

Description

Due to unexpected circumstances, I ended up deploying a new version of cassandra-metrics-collector prior to filtering out the new metrics it introduced. Sorry about that.

For all nodes that have been upgraded to Cassandra 2.2.6, we'll need to cleanup the o.a.c.metrics.Keyspace.* Graphite metrics. At the time of this writing that should be:

  • all RESTBase staging nodes
  • restbase1007-{a,b,c}.eqiad.wmnet

Related Objects

Event Timeline

no problem, I see some metrics are still being updated though, e.g.

3436550197  304 -rw-r--r--   1 _graphite _graphite   309088 Jun 14 11:25 /var/lib/carbon/whisper/cassandra/restbase1007-a/org/apache/cassandra/metrics/Keyspace/local_group_wikiquote_T_parsoid_stash_html/AllMemtablesLiveDataSize/value.wsp

though it doesn't seem a big problem now in terms of disk space, ~10G or so in total for restbase1007 metrics

Mentioned in SAL [2016-06-14T15:54:38Z] <urandom> Restarting cassandra-metrics-collector on restbase1007 : T137304

no problem, I see some metrics are still being updated though, e.g.

3436550197  304 -rw-r--r--   1 _graphite _graphite   309088 Jun 14 11:25 /var/lib/carbon/whisper/cassandra/restbase1007-a/org/apache/cassandra/metrics/Keyspace/local_group_wikiquote_T_parsoid_stash_html/AllMemtablesLiveDataSize/value.wsp

though it doesn't seem a big problem now in terms of disk space, ~10G or so in total for restbase1007 metrics

Ok, that is unexpected. The filter.yaml on 1007 looks like this:

1
2whitelist:
3 - '.*\.metrics\.ColumnFamily\.local_group_.*\.meta\.CoordinatorReadLatency\..*$'
4 - '.*\.metrics\.ColumnFamily\.local_group_.*\.meta\.CoordinatorScanLatency\..*$'
5 - '.*\.metrics\.ColumnFamily\.local_group_.*\.meta\.EstimatedColumnCountHistogram\..*$'
6 - '.*\.metrics\.ColumnFamily\.local_group_.*\.meta\.EstimatedRowCount\..*$'
7 - '.*\.metrics\.ColumnFamily\.local_group_.*\.meta\.EstimatedRowSizeHistogram\..*$'
8 - '.*\.metrics\.ColumnFamily\.local_group_.*\.meta\.LiveDiskSpaceUsed\..*$'
9 - '.*\.metrics\.ColumnFamily\.local_group_.*\.meta\.LiveSSTableCount\..*$'
10 - '.*\.metrics\.ColumnFamily\.local_group_.*\.meta\.LiveScannedHistogram\..*$'
11 - '.*\.metrics\.ColumnFamily\.local_group_.*\.meta\.MaxRowSize\..*$'
12 - '.*\.metrics\.ColumnFamily\.local_group_.*\.meta\.MeanRowSize\..*$'
13 - '.*\.metrics\.ColumnFamily\.local_group_.*\.meta\.MinRowSize\..*$'
14 - '.*\.metrics\.ColumnFamily\.local_group_.*\.meta\.PendingCompactions\..*$'
15 - '.*\.metrics\.ColumnFamily\.local_group_.*\.meta\.PendingFlushes\..*$'
16 - '.*\.metrics\.ColumnFamily\.local_group_.*\.meta\.RangeLatency\..*$'
17 - '.*\.metrics\.ColumnFamily\.local_group_.*\.meta\.ReadLatency\..*$'
18 - '.*\.metrics\.ColumnFamily\.local_group_.*\.meta\.SSTablesPerReadHistogram\..*$'
19 - '.*\.metrics\.ColumnFamily\.local_group_.*\.meta\.TombstoneScannedHistogram\..*$'
20 - '.*\.metrics\.ColumnFamily\.local_group_.*\.meta\.TotalDiskSpaceUsed\..*$'
21 - '.*\.metrics\.ColumnFamily\.local_group_.*\.meta\.WriteLatency\..*$'
22
23
24
25blacklist:
26 - '.*\.15MinuteRate$'
27 - '.*\.5MinuteRate$'
28 - '.*\.98percentile$'
29 - '.*\.999percentile$'
30 - '.*\.mean$'
31 - '.*\.meanRate$'
32 - '.*\.min$'
33 - '.*\.stddev$'
34 - '.*\.metrics\.ColumnFamily\.local_group_.*\.meta\..*$'
35 - '.*\.metrics\.Keyspace\..*$'
36 - '.*\.metrics\.Client\..*$'

I'd expect that second-to-last entry to take care of this; It looks right to me.

I restarted cmc on 1007 just in case that didn't happen after the filter update. Let me know if they're stilling being updated.

indeed those metrics have stopped updating now, I've removed Keyspace and Client now

$ du -hcs */org/apache/cassandra/metrics/{Keyspace,Client}
3.3G	restbase1007-a/org/apache/cassandra/metrics/Keyspace
3.3G	restbase1007-b/org/apache/cassandra/metrics/Keyspace
3.3G	restbase1007-c/org/apache/cassandra/metrics/Keyspace
304K	restbase1007-a/org/apache/cassandra/metrics/Client
304K	restbase1007-b/org/apache/cassandra/metrics/Client
304K	restbase1007-c/org/apache/cassandra/metrics/Client
9.8G	total