Page MenuHomePhabricator

Cassandra3 migration plan proposal
Closed, ResolvedPublic

Event Timeline

Milimetric moved this task from Incoming to Operational Excellence on the Analytics board.

Before starting, there are some notes to keep in mind:

  • we have currently 6 nodes running cassandra 2.2
  • 3 of them are due to be refreshed due to hw warranty expiration
  • we have in mind to expand the cluster to 9/12 nodes to host more data (needs to be verified)
  • the goal is to upgrade to Cassandra 3.11 (already running for Restbase)

We have a couple of options, not sure about their flexibility/etc.. yet:

  • Upgrade the cluster in place

This is something that wasn't tested by the Services team at the time, since a new Restbase cluster was created. The idea should be to upgrade one node at the time, and call nodetool sstableupgrade (it takes a long time to complete usually).
We could take the current cluster, upgrade it in place, and then add/remove new nodes later on.
The in place upgrade looks appealing, but if we hit bugs half way through we could end up in a weird state (recovering may not be trivial at this point).

  • Create a new cluster and stream sstables content to it

This could be doable with sstableloader, but there are some question marks about streaming sstables v2.2 to a 3.11 cluster (should we use sstableloader v3.11 on a 2.2 node? Is it sufficient to use sstableloader 2.2 on a 2.2 node, stream to a 3.11 node and then run nodetool sstableupgrade on it? etc..).
The upgrade could be something like:

  • set up a 3.11 cluster on a brand new 6/9 nodes cluster
  • stop data loading (or part of it) to the 2.2 cluster
  • stream sstables from every node to the new cluster
  • test etc..
  • switch cluster and data import

In both cases, we need some testing to make sure that what we want to do is feasible. @Eevans did I write something horribly incorrect or does it make sense?

IIUC, due to hw budget changes, we have to replace all the 6 nodes sometimes in 2021, so the in place upgrade option seems not worth it (will need to confirm this, just sent an email to Faidon).

As far as timing, I see from https://cassandra.apache.org/download/ that Cassandra 2.2 will be EOLed when 4.0 will be out (no release date yet). To avoid rushing, I'd start the work on Cassandra 3.11 sooner rather than later, since it will surely require time.

Before starting, there are some notes to keep in mind:

  • we have currently 6 nodes running cassandra 2.2
  • 3 of them are due to be refreshed due to hw warranty expiration
  • we have in mind to expand the cluster to 9/12 nodes to host more data (needs to be verified)
  • the goal is to upgrade to Cassandra 3.11 (already running for Restbase)

We have a couple of options, not sure about their flexibility/etc.. yet:

  • Upgrade the cluster in place

This is something that wasn't tested by the Services team at the time, since a new Restbase cluster was created. The idea should be to upgrade one node at the time, and call nodetool sstableupgrade (it takes a long time to complete usually).
We could take the current cluster, upgrade it in place, and then add/remove new nodes later on.
The in place upgrade looks appealing, but if we hit bugs half way through we could end up in a weird state (recovering may not be trivial at this point).

  • Create a new cluster and stream sstables content to it

This could be doable with sstableloader, but there are some question marks about streaming sstables v2.2 to a 3.11 cluster (should we use sstableloader v3.11 on a 2.2 node? Is it sufficient to use sstableloader 2.2 on a 2.2 node, stream to a 3.11 node and then run nodetool sstableupgrade on it? etc..).

Probably the former (v3.11 importer), but this is something you can test.

The upgrade could be something like:

  • set up a 3.11 cluster on a brand new 6/9 nodes cluster
  • stop data loading (or part of it) to the 2.2 cluster
  • stream sstables from every node to the new cluster
  • test etc..
  • switch cluster and data import

I think you want a step in here to do a snapshot, so that you're operating from a stable set of SSTables. The data wouldn't change (not until you re-enabled imports, but compaction would likely be shuffling things around in the meantime.

In both cases, we need some testing to make sure that what we want to do is feasible. @Eevans did I write something horribly incorrect or does it make sense?

LGTM.

@Eevans please be patient, me and Joseph had a long chat about upgrading in place and there are some doubts that we have. Overall, what we'd like to do is something like:

  1. stop ingestion of data
  2. for every cassandra node, take a snapshot of the current sstable status. This should be an hard link copy, so not a big issue space-wise
  3. we restart the ingestion of data
  4. for every node, we upgrade cassandra to 3.11 and we launch the sstable upgrade process.

The idea is that if for some reason the upgrade process fails half way through, we'll have to rollback to the snapshot state and possibly reload the missing data (from the start of the upgrade to the start of the rollback), but in the meantime we'll speculatively assume that the upgrade will succeed and keep loading.

If the above makes sense, then:

  • would it be better to, in step 2), upgrade the whole cluster to 3.11 and then start the sstable upgrade on a per node basis later on? The idea would be to be able to check how the cluster runs with 3.11 (even if using an old sstable format).
  • what happens to the snapshot taken to the cassandra 2.2 sstables on a node when we launch the sstable upgrade to 3.11? Since it is a hard link, we'd expect that data is not removed but possibly duplicated (sort of copy on write).

@Eevans please be patient, me and Joseph had a long chat about upgrading in place and there are some doubts that we have. Overall, what we'd like to do is something like:

No worries. :)

  1. stop ingestion of data
  2. for every cassandra node, take a snapshot of the current sstable status. This should be an hard link copy, so not a big issue space-wise
  3. we restart the ingestion of data
  4. for every node, we upgrade cassandra to 3.11 and we launch the sstable upgrade process.

FWIW, since the snapshot is just hard-linking files, it should happen almost instantly (maybe minutes at most?); With ingestion happening once daily, this could probably be effectively collapsed to two steps. :)

The idea is that if for some reason the upgrade process fails half way through, we'll have to rollback to the snapshot state and possibly reload the missing data (from the start of the upgrade to the start of the rollback), but in the meantime we'll speculatively assume that the upgrade will succeed and keep loading.

If the above makes sense, then:

  • would it be better to, in step 2), upgrade the whole cluster to 3.11 and then start the sstable upgrade on a per node basis later on? The idea would be to be able to check how the cluster runs with 3.11 (even if using an old sstable format).

You could, yeah. There is some merit to keeping the period during which you are running a mixed-version cluster minimal. Streaming operations for example can be problematic, so if you had a failure halfway through that required you to rebuild a node (or decommission, bootstrap, etc), that could become an issue.

You could also do one node, and at least start an SSTable upgrade. Personally, I would be relatively confident after seeing it rewrite a few 10s of files that it would be able to do the rest, (more confident the more progress was made ofc). Or you could do one node and complete an SSTable rewrite entirely as your canary, and then upgrade the rest of the cluster before kicking off the rewrites.

  • what happens to the snapshot taken to the cassandra 2.2 sstables on a node when we launch the sstable upgrade to 3.11? Since it is a hard link, we'd expect that data is not removed but possibly duplicated (sort of copy on write).

Yeah, if you snapshot and then completely rewrite all of the SSTables, you'll have twice the data.

@Eevans thanks a lot for all the answers. If you have time I have another doubt :)

Say that we upgrade a node to 3.11, and upgrade a small schema's sstables to the new format. All good, we decide to proceed with the rest of the cluster in a rolling-upgrade fashion. At this point, we'll have nodes running 3.11 with sstables version 2.x, and I am not super clear about what happens to compaction and new sstables. The assumption is that new tables will be created in the 3.11 format, and old 2.2 sstables may be upgraded as well when compaction happens (modulo the 2.x sstable snapshot that will remain the same). Is it the right assumption or are there any obscure details that I don't know? :)

Within my limited understanding of Cassandra, the plan looks good to me. A few additional notes for the maps cluster:

  • all data in there can be re-generated. It's time consuming, but that's just computer time, not human
  • we should replace all maps nodes next fiscal year, so it might be easier to just setup a new cluster in parallel

@Gehel yes we are in the same position, namely we'll refresh our cluster during next fiscal, but basically a year from now (IIRC the same for maps) so it would make sense to start investing a bit of time in trying in-place upgrades, so any next jump will be easier.

We agreed on upgrading Cassandra to 3.11 to start experimenting with rolling upgrades for Cassandra. The high level plan is to test the in place upgrade in cloud/labs first (to iron out any details about the procedure to follow), sync with Eric and then do it on the AQS cluster.

The goal of this task is met in my opinion, with have all the info needed to move to the next step, namely try to test an in place upgrade somewhere. Going to open a task about it.

elukey added a project: Analytics-Kanban.
elukey moved this task from Next Up to Done on the Analytics-Kanban board.