Page MenuHomePhabricator

MVP: Cassandra (multi-instance) management tools
Closed, ResolvedPublic

Description

Moving to a configuration where each host can run an arbitrary number of Cassandra processes has made some routine admin tasks more difficult and error prone. For example, for many routine tasks, instead of simply iterating over hosts, you now need to iterate over instances as you iterate over hosts (and relying on the number and naming of them is unreliable). This task will serve to track requirements of basic multi-instance management, and the progress toward its implementation.

Since the development of such a tool-set will be on on-going effort, this task will serve to define the minimum viable product.

Tools included in the MVP:

  • c-ls: Enumeration of instance IDs
  • c-foreach-nt: A foreach for nodetool; Sequentially excecutes a nodetool command on local instances, in alternating colors
  • c-cqlsh: Connects to an instance by its name, using /etc/cassandra-{instance}/cqlshrc for credentials
  • c-any-nt: Run a nodetool command against a randomly chosen instance
  • c-foreach-restart: Sequentially, intelligently, restart instances (drain->restart-verify availability->...)

For the most part, the above tools already exist and can be found here.

Items that remain:

  • c-foreach-nt (shell) should be rewritten in Python to improve handling of stdout v. stderr
  • c-any-nt remains to be written (trivial)
  • c-foreach-restart should accept arguments for retries and timeouts
  • c-foreach-restart should accept an argument for a script/command to execute after shutdown, and prior to restart
  • Integration of c-foreach-restart into the Ansible scripts
  • SAL-based logging (implemented; pending firewall, acls, and testing)
  • Deployment
  • Documentation (https://wikitech.wikimedia.org/wiki/Cassandra/Tools)

Sample output of c-foreach-nt:

An animated gif of the cassandra-streams utility:

NOTE: Re: SAL logging see /usr/local/bin/dologmsg on tin or an example of how this could be done, (T141619: dologmsg doesn't work on terbium, too).

Event Timeline

Change 289264 had a related patch set uploaded (by Eevans):
add CQL interface and port to descriptors

https://gerrit.wikimedia.org/r/289264

Change 289264 merged by Alexandros Kosiaris:
add CQL interface and port to descriptors

https://gerrit.wikimedia.org/r/289264

@Eevans, as you know, we also have some cassandra related tools in https://github.com/wikimedia/ansible-deploy/tree/master/roles/cassandra. Do you intend this to replace the ansible tools, or do you think it would make sense to merge / integrate the functionality somehow?

@Eevans, as you know, we also have some cassandra related tools in https://github.com/wikimedia/ansible-deploy/tree/master/roles/cassandra. Do you intend this to replace the ansible tools, or do you think it would make sense to merge / integrate the functionality somehow?

I actually think they'd be complementary, (insofar as they even intersect one another). For example, a tool that iteratively restarts instances could be called by an operator on the machine locally, or by a tool that is orchestrating restarts remotely. For ansible, this would simplify things quite a bit, not only by eliminating the janky hacks we put into place to deal with instances, but also because we don't need to replicate per-instance configuration.

Eevans renamed this task from Cassandra (multi-instance) management tools to MVP: Cassandra (multi-instance) management tools.Aug 15 2016, 3:59 PM
Eevans triaged this task as Medium priority.
Eevans updated the task description. (Show Details)

c-foreach-restart now takes an --execute-post-shutdown option for a command to run after Cassandra has been shutdown, and before it is restarted. You can also template the command with {id} and have it replaced with the instance ID. For example:

1eevans@restbase-test2001:~$ sudo c-commands/c-foreach-restart --execute-post-shutdown="echo \"Doing the needful for instance {id}\""
2INFO:root:[a] Disabling client ports...
3INFO:root:[a] Draining...
4INFO:root:[a] Stopping service cassandra-a
5INFO:root:[a] Executing post-shutdown command: echo "Doing the needful for instance {id}"
6INFO:root:[a] "Doing the needful for instance a"
7INFO:root:[a] Starting service cassandra-a
8WARNING:root:[a] CQL (10.192.16.154:9042) not listening (will retry)...
9WARNING:root:[a] CQL (10.192.16.154:9042) not listening (will retry)...
10INFO:root:[a] CQL (10.192.16.154:9042) is UP
11INFO:root:[b] Disabling client ports...
12INFO:root:[b] Draining...
13INFO:root:[b] Stopping service cassandra-b
14INFO:root:[b] Executing post-shutdown command: echo "Doing the needful for instance {id}"
15INFO:root:[b] "Doing the needful for instance b"
16INFO:root:[b] Starting service cassandra-b
17WARNING:root:[b] CQL (10.192.16.155:9042) not listening (will retry)...
18WARNING:root:[b] CQL (10.192.16.155:9042) not listening (will retry)...
19INFO:root:[b] CQL (10.192.16.155:9042) is UP
20eevans@restbase-test2001:~$

A Debian package of the tools (1.0.0) has been installed on all production and staging machines.