Page MenuHomePhabricator

Create a cookbook to restart the jvms on a Cassandra cluster
Closed, ResolvedPublic

Description

We have several clusters of Cassandra in production, and once in a while we need to roll restart all their jvms for security upgrades. Ideally this could be done by a cookbook rather than manually.

What I usually do for the AQS cluster is (two Cassandra instances for each of the 6 nodes):

  1. select one host
  2. check nodetool-a and nodetool-b, they should return a list of 12 IPs with UN state each (without any errors for say instance bootstrapping or down)
  3. nodetool-a drain + systemctl restart cassandra-a and nodetool-b drain` + systemctl restart cassandra-b
  4. wait until nodetool-a and nodetool-b return 12 IPs with UN state
  5. proceed with the next host

A couple of notes:

  • nodetool drain is probably not needed, but it seems a good step to add anyway.
  • 4) in theory could be simplified in something like "wait 5 minutes, run nodetool-a status | egrep '^DN' | wc -l and check that it is 12, fail otherwise". But the sleep time depends of course from the cluster's data and should be configurable (with a sane default).

Suggestions are welcome!

Event Timeline

Really nice! AQS is not supported and I wasn't aware :P

I supports single instance Cassandra clusters as well (for maps), so all it should take is to add "aqs" to the list of clusters