The list of Cassandra seed nodes is generated from the entire list of Cassandra instances (excluding the current), and additionally includes the hostname itself. Since the main host where instances reside are not actually running Cassandra, this results in a very large number of Gossip-related connection failure log messages (which conspire to obscure valuable log data).
Description
Details
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | Eevans | T160570 Cassandra 3.x Tracking | |||
Resolved | • mobrovac | T172610 Invalid Cassandra seeds list is spamming the debug logs |
Event Timeline
Change 370554 had a related patch set uploaded (by Mobrovac; owner: Mobrovac):
[operations/puppet@production] Cassandra: Do not include the main DNS in the list of seeds
Change 370554 merged by Filippo Giunchedi:
[operations/puppet@production] Cassandra: Do not include the main DNS in the list of seeds
This has been merged and Puppet has been run. The main IPs are no longer in the seeds lists, so resolving.
Whoops, just noticed that this wiped out the seeds list in deployment-prep:
eevans@deployment-restbase01:~$ grep -B 9 -A 3 seeds: /etc/cassandra/cassandra.yaml parameters: # seeds is actually a comma-delimited list of addresses. # Ex: "<ip1>,<ip2>,<ip3>" # Omit own host name / IP in multi-node clusters (see # https://phabricator.wikimedia.org/T91617). # Also disregard the main DNS interfaces of each node when # multiple instances are colocated on the same node (see # https://phabricator.wikimedia.org/T172610) - seeds: # For workloads with more data than can fit in memory, Cassandra's # bottleneck will be reads that need to fetch data from # disk. "concurrent_reads" should be set to (16 * number_of_drives) in eevans@deployment-restbase01:~$
Reopening...
I've looked at this, and it's not clear to me why this is happening. Assuming it's the new conditional added to filter seeds (wikimedia/puppet/.../cassandra.yaml-3.x.erb), I would assume that would be true by virtue of @instance_count == 1.
That said, this is all starting to look quite brittle to me (for example, we're now matching on IPv4 addresses and specific hostnames). With or without a fix for this particular issue, I'd be in favor of moving to a small statically configured list of seeds.
Change 377997 had a related patch set uploaded (by Mobrovac; owner: Mobrovac):
[operations/puppet@production] Cassandra: Include only instance DNS' in the list of seeds
Change 377997 merged by Gehel:
[operations/puppet@production] Cassandra: Include only instance DNS' in the list of seeds
Ok, the above patch truly fixed the issue. There were problems in the seed list in both labs and staging, and they have now been remedied.