when fully deployed we'll likely have ~3-4x the number of cassandra jvms running we have now, ATM every jvm is also a seed node. It isn't clear what's the impact of having a large (how?) list of seed nodes, marking each node as a seed is not recommended by datastax's documentation. However it isn't clear at what point it can create problems
Description
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Invalid | None | T93751 RFC: Next steps for long-term revision storage -- space needs, storage hierarchies | |||
Resolved | RobH | T93790 Expand RESTBase cluster capacity | |||
Resolved | fgiunchedi | T108306 better cassandra process checks | |||
Resolved | Eevans | T106619 investigate G1GC pause times | |||
Resolved | fgiunchedi | T95253 Finish conversion to multiple Cassandra instances per hardware node | |||
Resolved | fgiunchedi | T113939 assess impact of many cassandra seed nodes with multi instance |
Event Timeline
One seed instance per hardware node should do it, as AFAIK the seed nodes are used to initialise Cassandra's DHT.
Hm, that will be a problem then if not all instances make it into the seed list.
when fully deployed we'll likely have ~3-4x the number of cassandra jvms running we have now, ATM every jvm is also a seed node. It isn't clear what's the impact of having a large (how?) list of seed nodes...
I can't think of any reason this would be a problem, but I admit to never having tried such a large list (or of knowing anyone who has).
also note that at the moment the firewall ACLs are based on seed nodes
Confounding the two notions in Puppet ('all nodes', vs 'seed nodes'), seems like a recipe for pain at some point (and a recurring theme), doesn't it? :) I could see fixing this on principle either way.
I don't think the seed / non-seed distinction matters beyond bootstrap. In normal operation, all that should matter is the total number of instances in the cluster, which will be moderate in our cluster. Others are running clusters with hundreds of nodes, which is reported to work fine.
I did some due diligence on this, and it's not quite the case that seeds are only used for bootstrapping. There are a couple of minor exceptions, though I can't see that they'd have any bearing on us. There is also some small danger that with time, the context of seeds will be expanded upon even further (i.e. CASSANDRA-9206), but again, it shouldn't be an issue with current versions.
IMHO, it couldn't hurt to separate the notions of "list of cluster nodes", from "list of seed nodes" in Puppet, but I see no harm in continuing with our current practice of having every node use every other (but itself), as a seed.
sounds like there's no immediate need to do for multi-instance, agreed on the puppet work (to be tracked elsewhere)