Page MenuHomePhabricator

Increased cassandra-metrics-collector utilization w/ Cassandra 3.x
Closed, ResolvedPublic

Description

In Cassandra 3.x, metrics with type=ColumnFamily where renamed to type=Table, and an alias for ColumnFamily was created (and subsequently deprecated). By Cassandra version 3.7, some of the deprecated metrics have already been removed (latency for example), so [[ https://github.com/wikimedia/cassandra-metrics-collector/commit/6a52eff6a0562feff174c32a714850351e62e176 | support for type=Table was added to cassandra-metrics-collector ]]. However, since the bulk of type=ColumnFamily metrics are still present, this amounts to collecting the data twice, and since this where the bulk of metrics come from, the utilization that results is significant.

Screenshot from 2017-04-28 11-06-43.png (1×2 px, 268 KB)

It is worth mentioning that this only impacts nodes running Cassandra 3.x (2.2.6 clusters do not have the type=Table metrics). Also worth mentioning is that since we are traditionally not CPU bound, the added utilization might not be a high priority (certainly worth fixing, but not necessarily worth blocking deployment of 3.x).

Possible approaches

Event Timeline

FWIW I think it makes sense to collect Table instead of ColumnFamily. To deal with the rename on the graphite side we can rename ColumnFamily to Table and symlink the former to point to the latter, and later adjust dashboards accordingly.

FWIW I think it makes sense to collect Table instead of ColumnFamily. To deal with the rename on the graphite side we can rename ColumnFamily to Table and symlink the former to point to the latter, and later adjust dashboards accordingly.

Table isn't there on 2.2.x nodes, so we'll have to continue collecting ColumnFamily there. Since there are currently no upgrades to 3.x planned (just new clusters), this might be a good time for a clean break.

Table isn't there on 2.2.x nodes, so we'll have to continue collecting ColumnFamily there. Since there are currently no upgrades to 3.x planned (just new clusters), this might be a good time for a clean break.

https://github.com/wikimedia/cassandra-metrics-collector/pull/20

Change 360654 had a related patch set uploaded (by Eevans; owner: Eevans):
[operations/software/cassandra-metrics-collector@master] Update collector version for 4.0.0 (Cassanra >= 3 only)

https://gerrit.wikimedia.org/r/360654

Change 360657 had a related patch set uploaded (by Eevans; owner: Eevans):
[operations/puppet@production] cmcd: link-in 4.0.0 jar on Cassanra 3.x nodes

https://gerrit.wikimedia.org/r/360657

Change 360654 merged by Mobrovac:
[operations/software/cassandra-metrics-collector@master] Update collector version for 4.0.0 (Cassanra >= 3 only)

https://gerrit.wikimedia.org/r/360654

Change 360657 merged by Filippo Giunchedi:
[operations/puppet@production] cmcd: link-in 4.0.0 jar on Cassandra 3.x nodes

https://gerrit.wikimedia.org/r/360657