Page MenuHomePhabricator

Script to collect forensic data from Cassandra hosts
Open, MediumPublic

Description

When issues arise with a Cassandra node, it is often most expedient to simply restart it and restore normal operation. However, doing so could destroy valuable information needed to track down the root cause. Since it is not realistic to assume that everyone responding to a alert will know what to look for, we should create a script to automate collecting and archiving relevant data for later examination.

Some ideas:

  • Heap dumps
  • Stack dump (or capture)
  • Logs (debug)
  • nodetool (if possible)
    • status
    • gcstats
    • compationthroughput
    • streamthroughput
    • gossipinfo
    • proxyhistograms
    • toppartitions
    • tpstats

Event Timeline

Eevans triaged this task as Low priority.Apr 6 2018, 7:43 PM

Removing task assignee due to inactivity, as this open task has been assigned to the same person for more than two years (see the emails sent to the task assignee on Oct27 and Nov23). Please assign this task to yourself again if you still realistically [plan to] work on this task - it would be welcome.
(See https://www.mediawiki.org/wiki/Bug_management/Assignee_cleanup for tips how to best manage your individual work in Phabricator.)

Eevans raised the priority of this task from Low to Medium.Jun 7 2021, 8:06 PM