Page MenuHomePhabricator

Cassandra instance DNS records - are they needed?
Open, LowPublic

Description

Hi everybody,

I opened this task after a chat with Faidon about AQS and per-cassandra-instance DNS records. As far as I know, those records are needed for the Cassandra seed list, but I am not sure if we could replace them with ports, in the following way:

  • aqs1004-a.eqiad.wmnet, aqs1004-b.eqiad.wmnet => aqs1004.eqiad.wmnet:7776, aqs1004.eqiad.wmnet:7777 (totally random ports for this example of course)

Is there a reason why we don't do this? Getting rid of the per instance record could reduce the corner cases needed for our DNS creation automation :)

One thing that I noticed from https://cassandra.apache.org/doc/latest/configuration/cassandra_config_file.html#seed-provider is that the example mentions an IP:host port combination, but it might be only for Cassandra 4.x.

Event Timeline

elukey added a subscriber: Volans.

IIRC at the time we set up multi-instance cassandra one/some of the ports could not be changed, and thus we went with per-instance addresses. This might not be the case anymore nowadays and it is indeed possible to use a single address and multiple ports instead.

At the very least, getting rid of these names would create inconvenience. There are lots of examples of maintenance and admin commands that run against an instance, or resolve IPs in output, and having to map IP addresses -to- instance won't be much fun.

jbond triaged this task as Medium priority.Dec 9 2020, 12:08 PM
jbond subscribed.

At the very least, getting rid of these names would create inconvenience. There are lots of examples of maintenance and admin commands that run against an instance, or resolve IPs in output, and having to map IP addresses -to- instance won't be much fun.

I'd share this concern, I use the reverse names of the various nodes extensively. Without some replacement mechanism within our tooling wrappers to make it easy to map host ID to address+port combo it would make things a bit more difficult. nodetool status for example doesn't even distinguish between instances by port

Eevans lowered the priority of this task from Medium to Low.Jun 7 2021, 7:12 PM

Did we decide against this? Is this issue still valid?

I think we're probably not doing this for now - please reopen if you feel strongly!

ayounsi subscribed.

The topic came back again Today as hosts requests in T307641 got provisioned without the additional IPs requiring heavy manual work to get it fixed. And it's not the first time it happened.

Even though this very specific issue could be fixed with better process, it raises questions about the complexity this "snowflake" adds to our provisioning automation, and network in general as hosts with multiple IPs on their NIC is uncommon at best.

My understanding here is that this special case is not (or not anymore) due to a technical limitation, but for ease of management. In that case maybe the tooling can be adapted to not need multiple dedicated IPs?

The topic came back again Today as hosts requests in T307641 got provisioned without the additional IPs requiring heavy manual work to get it fixed. And it's not the first time it happened.

Even though this very specific issue could be fixed with better process, it raises questions about the complexity this "snowflake" adds to our provisioning automation, and network in general as hosts with multiple IPs on their NIC is uncommon at best.

My understanding here is that this special case is not (or not anymore) due to a technical limitation, but for ease of management. In that case maybe the tooling can be adapted to not need multiple dedicated IPs?

Rereading the comment history, I think @fgiunchedi was just explaining the historical precedent, and noting that perhaps something had changed. My reply never addressed whether or not that might or might not be the case anymore, and instead focused on what this would do to cluster management (because even if it were possible to use alternate ports, the impact to management would by far be the bigger issue).

So to address whether this is even possible, I did take a cursory look this time and it doesn't seem like anything has changed. For example, when a new outbound socket is created it uses DatabaseDescriptor.getSSLStoragePort() or DatabaseDescriptor.getStoragePort() respectively for connections with TLS or without.

@SuppressWarnings("resource") // Closing the socket will close the underlying channel.
public static Socket newSocket(InetAddress endpoint) throws IOException
{
    // zero means 'bind on any available port.'
    if (DatabaseDescriptor.getServerEncryptionOptions().shouldEncrypt(endpoint))
    {
        return SSLFactory.getSocket(DatabaseDescriptor.getServerEncryptionOptions(), endpoint, DatabaseDescriptor.getSSLStoragePort());
    }
    else
    {
        SocketChannel channel = SocketChannel.open();
        channel.connect(new InetSocketAddress(endpoint, DatabaseDescriptor.getStoragePort()));
        return channel.socket();
    }
}

DatabaseDescriptor is basically just an interface to configuration data and returns the value for these ports that the connecting node uses:

public static int getSSLStoragePort()
{
    return Integer.parseInt(System.getProperty(Config.PROPERTY_PREFIX + "ssl_storage_port", Integer.toString(conf.ssl_storage_port)));
}

So it is not possible to bind multiple instances to the same IP interface using different ports (and still have them communicate with one another). :(

Also, via https://cassandra.apache.org/_/blog/Configurable-Storage-Ports-and-Why-We-Need-Them.html:

How Do My Other Cassandra Nodes Know About Different storage_port Settings?

If you need to change the storage_port setting for one of your Cassandra nodes, you will need to change the storage_port setting for all of the nodes in that ring. If you have multiple rings across different data centers, then you will need to set up some kind of port-forwarding solution at the firewall or router level if those data centers use different storage_port settings. You will need to check with your firewall or router documentation to learn how to configure port forwarding. There are many variations across vendors and models.

And to address @elukey's observation about what the docs say (updated url for that is now here): That would seem to be a bug, I will get a report filed upstream to have that fixed.

EDIT: https://issues.apache.org/jira/browse/CASSANDRA-17689

Ok, so I do need to walk some of this back. It is now possible in 4.x (the documentation is correct in that context), thanks to CASSANDRA-7544. We are probably quite some ways away from an upgrade to 4.x (it is a huge lift from 3.11.x), and it still strikes me as a great deal of work to adapt our tooling and management processes.

Our multi-instance-per-host setup is already a source of pain, investing even more effort into tooling and/or incurring any additional on-going pain would have to be weighed against the exceptional nature of having to add new nodes, but this is something we could definitely look into.

Thanks for the quick an thorough answer! Glad to see that there is progress upstream!

exceptional nature of having to add new nodes

It's not just this, but also the long term cost of maintaining special cases in our automation and network for those servers.

But I agree it's not easy to weight the pain of both sides. Documenting the why we can/can't as you started here is already valuable! Thanks

@Eevans reviving this years old thread now that Cassandra has been upgraded to 4.x since a few months. Would it be possible to look into not using extra IPs, at least on new/future/re-imaged nodes ?

@Eevans reviving this years old thread now that Cassandra has been upgraded to 4.x since a few months. Would it be possible to look into not using extra IPs, at least on new/future/re-imaged nodes ?

Short answer: Yes :)

Somewhat longer answer: I think this will be a fairly large undertaking, the kind that will probably span several quarters cradle-to-grave. In other words, I believe it is definitely worth doing, but we'll need to scope the work and suss out the priority relative to everything else. I've added this to DP's roadmap planning for discussion, and will update here when I know more!