Page MenuHomePhabricator

Enable Prometheus metrics export for Cassandra
Closed, ResolvedPublic

Description

Prometheus would provide a number of benefits for us over Graphite, most significantly, a more straightforward way to share dashboards across clusters (the way Prometheus is modeled, the cluster is an attribute of metrics that can be easily templated in Grafana).

Plan A

The easiest way to set this up would seem to be jmx_exporter, a JVM agent that spins up its own in-process HTTP server to export metrics. I have (lightly) tested this in deployment-prep, and it seems to work well. One added benefit of this approach would be that we could eliminate cassandra-metrics-collector (one less application to maintain, and one less moving part on each host).

Next steps:

  • Fork https://github.com/prometheus/jmx_exporter to the Wikimedia account, and tag a release
  • Upload a build to Archiva
  • Get a deployment repository setup
  • Puppetize the loading of the agent, and Ferm rules
  • Push to Staging and deployment-prep for further evaluation
  • Evaluate impact of cmcd collection in isolation

See also: https://github.com/prometheus/jmx_exporter/issues/113

Plan B

Another approach would be to add Prometheus support to cassandra-metrics-collector, allowing it to simultaneously support both the transport of metrics to Graphite, and the export of Prometheus metrics via HTTP. This would provide two main benefits: a) It would run outside of the Cassandra process, (potentially saving some GC pressure), and b) it could export a copy of the metrics cached from the last Graphite collection, thus saving Cassandra from the load of any additional polling.

While the amount of development effort needed to add Prometheus support would be quite small, this further entrenches a piece of software that we are required to maintain, and so should be used in the event Plan A is unsuccessful.

Event Timeline

Eevans added a subscriber: fgiunchedi.

Change 331911 had a related patch set uploaded (by Filippo Giunchedi):
cassandra: add jmx_exporter to Cassandra in deployment-prep

https://gerrit.wikimedia.org/r/331911

A request for a deployment repository (operations/software/prometheus_jmx_exporter) has been submitted: https://www.mediawiki.org/wiki/Git/New_repositories/Requests

Change 331911 merged by Filippo Giunchedi:
cassandra: add jmx_exporter to Cassandra in deployment-prep

https://gerrit.wikimedia.org/r/331911

Change 332535 had a related patch set uploaded (by Eevans):
WIP: Enable Prometheus JMX exporter on Cassandra nodes

https://gerrit.wikimedia.org/r/332535

Change 332542 had a related patch set uploaded (by Eevans):
Prometheus JMX exporter deploy repository

https://gerrit.wikimedia.org/r/332542

Change 332682 had a related patch set uploaded (by Eevans):
fix incorrect port in ferm rule

https://gerrit.wikimedia.org/r/332682

Change 332682 merged by Filippo Giunchedi:
fix incorrect port in ferm rule

https://gerrit.wikimedia.org/r/332682

Change 332542 merged by Eevans:
Prometheus JMX exporter deploy repository

https://gerrit.wikimedia.org/r/332542

Change 332535 merged by Filippo Giunchedi:
Enable Prometheus JMX exporter on Cassandra nodes

https://gerrit.wikimedia.org/r/332535

Change 335826 had a related patch set uploaded (by Eevans):
Enable JMX exporter on RESTBase Staging nodes in eqiad

https://gerrit.wikimedia.org/r/335826

Change 336831 had a related patch set uploaded (by Eevans):
Update path of exporter jar to currently deployed version

https://gerrit.wikimedia.org/r/336831

Change 336831 merged by Filippo Giunchedi:
Update path of exporter jar to currently deployed version

https://gerrit.wikimedia.org/r/336831

Change 335826 merged by Filippo Giunchedi:
Enable JMX exporter on RESTBase Staging nodes in eqiad

https://gerrit.wikimedia.org/r/335826

Change 337034 had a related patch set uploaded (by Eevans):
Fix broken path to Prometheus exporter config

https://gerrit.wikimedia.org/r/337034

Change 337034 merged by Filippo Giunchedi:
Fix broken path to Prometheus exporter config

https://gerrit.wikimedia.org/r/337034

Change 337493 had a related patch set uploaded (by Eevans):
Enable Prometheus exporter on restbase1007 (canary)

https://gerrit.wikimedia.org/r/337493

Change 337493 merged by Filippo Giunchedi:
Enable Prometheus exporter on restbase1007 (canary)

https://gerrit.wikimedia.org/r/337493

Mentioned in SAL (#wikimedia-operations) [2017-02-15T16:16:35Z] <urandom> T155120: restarting Cassandra on restbase1007-a to enable Prometheus exporter (canary)

This is now deployed to restbase1007, and the 1007-a instance has been restarted to serve as a canary. I have the following running from two screen sessions (to approximate having two Prometheus collectors polling the agent):

while true; do curl http://10.64.0.202:7800/metrics 2>/dev/null && (echo; sleep `shuf -i 45-60 -n 1`; echo 'times up!!'); done

Unfortunately, this seems to result in a non-trivial increase in GC collection time: https://grafana.wikimedia.org/dashboard/snapshot/7HJrqkejCweP2x5WE76JFL0cKj1footD

Change 338010 had a related patch set uploaded (by Eevans):
Revert "Enable Prometheus exporter on restbase1007 (canary)"

https://gerrit.wikimedia.org/r/338010

Change 338010 merged by Filippo Giunchedi:
Revert "Enable Prometheus exporter on restbase1007 (canary)"

https://gerrit.wikimedia.org/r/338010

Mentioned in SAL (#wikimedia-operations) [2017-02-17T15:26:08Z] <urandom> T155120: Restarting Cassandra on restbase1007-a.eqiad.wmnet to disable Prometheus exporter agent

The plan here was to enable the exporter alongside the existing metrics collection, and slowly (incrementally) transition to it. Once we had production-ready scraping and storage, and dashboards in place, we could consider deprecating our graphite metrics. On the single node tested, I did not observe any noticeable increase in utilization or latency, but I'm not sure I'm comfortable moving forward like this knowing that it's adding GC pressure.

One thing worth considering is that what we're seeing here is simply the cost of running through all of these MBeans, and serializing the results. Before abandoning this approach, it might be worth testing the impact of our existing JMX metrics collection in isolation. If the impact is similar, then we could weigh the option of moving forward, perhaps with a better coordinated and more aggressive timeline for the deprecation of cmcd collection. In other words, if the replacement of cmcd by the Prometheus exporter is a net-zero change, than perhaps we could live with the higher GC pressure for the time it took to migrate (and we could always considering lowering the scrape frequency during this period).

Change 342825 had a related patch set uploaded (by Elukey):
[operations/puppet] Update prometheus jmx_exporter path in deployment-prep

https://gerrit.wikimedia.org/r/342825

Change 342825 merged by Elukey:
[operations/puppet] Update prometheus jmx_exporter path in deployment-prep

https://gerrit.wikimedia.org/r/342825

Change 342829 had a related patch set uploaded (by Elukey):
[operations/puppet] Update Cassandra jmx_exporter config path in deployment-prep

https://gerrit.wikimedia.org/r/342829

Change 342829 merged by Elukey:
[operations/puppet] Update Cassandra jmx_exporter config path in deployment-prep

https://gerrit.wikimedia.org/r/342829

Based on the additional overhead observed in T164093, (the result there of collecting against both Table and ColumnFamily), I'm reasonably convinced that there isn't anything out of the ordinary about the overheads observed here; I think this is simply the cost associated with the collection and serialization of all these metrics.

If we are concerned about incurring this cost twice (once for graphite, and again for prometheus), we could consider piggy-backing on the plans to deploy Cassandra 3.x and redesigned storage to a separate cluster, and make the transition to prometheus as part of that rollout. This would allow Ops the opportunity to gradually expand prometheus capacity as well.

GWicke edited projects, added Services (doing); removed Services.

@Eevans, is there anything left to do here?

I think we can call it done.