The list of Cassandra seed nodes is generated from the entire list of Cassandra instances (excluding the current), and additionally includes the hostname itself. Since the main host where instances reside are not actually running Cassandra, this results in a very large number of Gossip-related connection failure log messages (which conspire to obscure valuable log data).
Whoops, just noticed that this wiped out the seeds list in deployment-prep:
eevans@deployment-restbase01:~$ grep -B 9 -A 3 seeds: /etc/cassandra/cassandra.yaml parameters: # seeds is actually a comma-delimited list of addresses. # Ex: "<ip1>,<ip2>,<ip3>" # Omit own host name / IP in multi-node clusters (see # https://phabricator.wikimedia.org/T91617). # Also disregard the main DNS interfaces of each node when # multiple instances are colocated on the same node (see # https://phabricator.wikimedia.org/T172610) - seeds: # For workloads with more data than can fit in memory, Cassandra's # bottleneck will be reads that need to fetch data from # disk. "concurrent_reads" should be set to (16 * number_of_drives) in eevans@deployment-restbase01:~$
I've looked at this, and it's not clear to me why this is happening. Assuming it's the new conditional added to filter seeds (wikimedia/puppet/.../cassandra.yaml-3.x.erb), I would assume that would be true by virtue of @instance_count == 1.
That said, this is all starting to look quite brittle to me (for example, we're now matching on IPv4 addresses and specific hostnames). With or without a fix for this particular issue, I'd be in favor of moving to a small statically configured list of seeds.