Figure out if nodes in different DCs can be bootstrapped in parallel
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	• GWicke
	Mar 21 2016, 3:51 PM

Description

In order to speed up the conversion to the multi-instance setup & finish the cluster expansion, it would be useful to bootstrap nodes in different DCs in parallel. I asked the cassandra-users mailing list about this, and the reply suggests that this should indeed work:

http://mail-archives.apache.org/mod_mbox/cassandra-user/201603.mbox/%3CCA%2BVSrLoXb7m0Ww8x7zYdtqrnu%2B-fu4e0e1hbszHM7h0xwtAypg%40mail.gmail.com%3E

So, I am proposing to try the following on the staging cluster:

start a bootstrap of another instance on one of the eqiad nodes, and
while that is running, bootstrap another instance in codfw.

Related Objects
Search...

Status	Assigned	Task
Invalid	None	T93751 RFC: Next steps for long-term revision storage -- space needs, storage hierarchies
Resolved	RobH	T93790 Expand RESTBase cluster capacity
Resolved	fgiunchedi	T108306 better cassandra process checks
Resolved	Eevans	T106619 investigate G1GC pause times
Resolved	fgiunchedi	T95253 Finish conversion to multiple Cassandra instances per hardware node
Resolved	Eevans	T130540 Figure out if nodes in different DCs can be bootstrapped in parallel

Event Timeline

• GWicke created this task.Mar 21 2016, 3:51 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMar 21 2016, 3:51 PM

• GWicke triaged this task as High priority.Mar 21 2016, 3:51 PM

• GWicke added a parent task: T95253: Finish conversion to multiple Cassandra instances per hardware node.

• GWicke updated the task description. (Show Details)

• GWicke edited subscribers, added: Eevans, fgiunchedi, • mobrovac; removed: Aklapper.

I asked the cassandra-users mailing list about this, and the reply suggests that this should indeed work:

http://mail-archives.apache.org/mod_mbox/cassandra-user/201603.mbox/%3CCA%2BVSrLoXb7m0Ww8x7zYdtqrnu%2B-fu4e0e1hbszHM7h0xwtAypg%40mail.gmail.com%3E

This answer is wrong, the relevant Cassandra issue where this is discussed is: https://issues.apache.org/jira/browse/CASSANDRA-2434

This is also consistent with the Datastax docs (though oddly enough the doc suggests you can bootstrap more than one in a rack, which I also think is wrong, (and so do others; it has been reported as a bug)).

TL;DR This can be done, but requires bypassing a safety meant to preserve consistency guarantees. Given the liberties we've taken in the past, maybe there is a precedent to do that, but given the aspirations to use RESTBase as more than a durable cache, we're going to need to start taking this seriously eventually.

@Eevans: As discussed before on IRC, I don't see https://issues.apache.org/jira/browse/CASSANDRA-2434 spelling out specific reasons for not allowing bootstrapping in two DCs. Instead, it is noted that with NTS, the node giving up token ranges will always be in the same DC: https://issues.apache.org/jira/browse/CASSANDRA-2434?focusedCommentId=13094846&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13094846

Which technical reasons to you see for bootstrapping in different DCs to be an issue, given that the token ranges of those bootstraps won't overlap?

In T130540#2145794, @GWicke wrote:

Which technical reasons to you see for bootstrapping in different DCs to be an issue, given that the token ranges of those bootstraps won't overlap?

I'm not sure what you mean when you say this. They do overlap. There is only one token space, and a bootstrapping node is going to (assuming perfect distribution), bisect 256 existing partitions (which with high probability will include some number of partitions from every node in the cluster).

Eevans added a project: Cassandra.Apr 28 2016, 4:25 PM

Eevans moved this task from Backlog to Next on the Cassandra board.

Eevans moved this task from Next to In-Progress on the Cassandra board.Apr 29 2016, 8:34 PM

After looking at this further, I believe it is the case that we can safely bootstrap two nodes in parallel, so long as a) each of them is in a distinct datacenter, and b) no writes are performed with a consistency level that would span these datacenters. It is the case in our environment (for the moment), that (b) always applies¹.

Relative safety aside, this will still not work without some intervention. When Cassandra starts up to perform a bootstrap, it checks gossip state to see if any other nodes are in a JOINING state. If any are, the node will refuse to bootstrap, regardless of the relationship between the two nodes (vis-a-vis NTS). This is the aforementioned "safety" designed to protect consistency during range movements, and the reasoning is sound, as consistency cannot be guaranteed under all supported consistency levels, (in other words, I see nothing here that requires follow-up with upstream).

We can choose to override the constraint using -Dcassandra.consistent.rangemovement=false at startup on a case-by-case basis, for those range movements that we know are safe (read: when conditions (a) and (b) above both apply).

[1]: The only exception I could come up with is authentication operations that involve a write for the superuser, which are implicitly performed with at QUORUM.

Eevans mentioned this in T134016: RESTBase Cassandra cluster: Increase instance count to 3.Apr 29 2016, 9:38 PM

Eevans closed this task as Resolved.Apr 29 2016, 9:44 PM

Eevans claimed this task.

Figure out if nodes in different DCs can be bootstrapped in parallelClosed, ResolvedPublicActions

Description

Related ObjectsSearch...

Event Timeline

Figure out if nodes in different DCs can be bootstrapped in parallel
Closed, ResolvedPublic
Actions

Related Objects
Search...