Page MenuHomePhabricator

assess impact of many cassandra seed nodes with multi instance
Closed, ResolvedPublic

Description

when fully deployed we'll likely have ~3-4x the number of cassandra jvms running we have now, ATM every jvm is also a seed node. It isn't clear what's the impact of having a large (how?) list of seed nodes, marking each node as a seed is not recommended by datastax's documentation. However it isn't clear at what point it can create problems

Event Timeline

fgiunchedi claimed this task.
fgiunchedi raised the priority of this task from to High.
fgiunchedi updated the task description. (Show Details)

also note that at the moment the firewall ACLs are based on seed nodes

One seed instance per hardware node should do it, as AFAIK the seed nodes are used to initialise Cassandra's DHT.

also note that at the moment the firewall ACLs are based on seed nodes

Hm, that will be a problem then if not all instances make it into the seed list.

when fully deployed we'll likely have ~3-4x the number of cassandra jvms running we have now, ATM every jvm is also a seed node. It isn't clear what's the impact of having a large (how?) list of seed nodes...

I can't think of any reason this would be a problem, but I admit to never having tried such a large list (or of knowing anyone who has).

also note that at the moment the firewall ACLs are based on seed nodes

Confounding the two notions in Puppet ('all nodes', vs 'seed nodes'), seems like a recipe for pain at some point (and a recurring theme), doesn't it? :) I could see fixing this on principle either way.

I don't think the seed / non-seed distinction matters beyond bootstrap. In normal operation, all that should matter is the total number of instances in the cluster, which will be moderate in our cluster. Others are running clusters with hundreds of nodes, which is reported to work fine.

I did some due diligence on this, and it's not quite the case that seeds are only used for bootstrapping. There are a couple of minor exceptions, though I can't see that they'd have any bearing on us. There is also some small danger that with time, the context of seeds will be expanded upon even further (i.e. CASSANDRA-9206), but again, it shouldn't be an issue with current versions.

IMHO, it couldn't hurt to separate the notions of "list of cluster nodes", from "list of seed nodes" in Puppet, but I see no harm in continuing with our current practice of having every node use every other (but itself), as a seed.

sounds like there's no immediate need to do for multi-instance, agreed on the puppet work (to be tracked elsewhere)