Moving to a configuration where each host can run an arbitrary number of Cassandra processes has made some routine admin tasks more difficult and error prone. For example, for many routine tasks, instead of simply iterating over hosts, you now need to iterate over instances as you iterate over hosts (and relying on the number and naming of them is unreliable). This task will serve to track requirements of basic multi-instance management, and the progress toward its implementation.
Since the development of such a tool-set will be on on-going effort, this task will serve to define the minimum viable product.
Tools included in the MVP:
* `c-ls`: Enumeration of instance IDs
* `c-foreach-nt`: A //foreach// for `nodetool`; Sequentially excecutes a nodetool command on local instances, in alternating colors
* `c-cqlsh`: Connects to an instance by its name, using `/etc/cassandra-{instance}/cqlshrc` for credentials
* `c-any-nt`: Run a `nodetool` command against a randomly chosen instance
* `c-foreach-restart`: Sequentially, intelligently, restart instances (drain->restart-verify availability->...)
For the most part, the above tools already exist and can be found [[https://github.com/eevans/cassandra-tools-wmf|here]].
Some items that remain:
* `c-foreach-nt` (shell) should be rewritten in Python to improve handling of stdout v. stderr
* ~~`c-any-nt` remains to be written (trivial)~~
* ~~`c-foreach-restart` should accept arguments for retries and timeouts~~
* Use of `c-foreach-restart` should be integrated into the ansible scripts
* SAL-based logging
* Deployment
* Documentation
Sample output of `c-foreach-nt`: {F3889970}
NOTE: Re: SAL logging see `/usr/local/bin/dologmsg` on tin or an example of how this could be done, ({T141619}, too).