Problem
Currently we have no policy on Ceph upgrades, and that makes it hard to find the time for it.
Current ceph releases have a lifespan of a bit more than 2 years, and there's a new release every year (see https://docs.ceph.com/en/latest/releases/index.html).
We currently get our "unofficial" packages from https://mirror.croit.io/ as there were some issues with the upstream ones (https://tracker.ceph.com/issues/53411) -- Lately upstream download.ceph.com seems to have caught up on building the packages, so we should use those
So two things have to be decided here, how frequent the upgrades should be, and what to upgrade on each.
Note that I'm considering only N.2.* versions as the others are only for development or test clusters, from the docs:
x.0.z - development versions x.1.z - release candidates (for test clusters, brave users) x.2.z - stable/bugfix releases (for users)
Constraints and risks
- We risk running an unsupported version of Ceph, not getting any new bugfixes or security patches.
- Debian packages are very delayed with respect upstream, so we might consider using other sources for them
Decision record
Decided for option 3
Options
Option 1
Do nothing
Pros:
- No changes to the current workflow
Cons:
- This means upgrades will be done "whenever we find some time", that's usually when a security patch of blocking bug comes around.
- Potential EOL (end of life) versions
- 3rd party repository
Option 2
Frequency: once a year
Version to upgrade to: (N-1).2.*
For example, if we have 16.2.15, and there is a new 18.2.0, we upgrade to 17.2.*, otherwise we upgrade to the latest 16.2.*
Pros:
- We get a very stable world tested version of ceph
Cons:
- We might get some months with EOL version
- We don't get fixes for a whole year
- We have to allocate time for it once a year (happy path 1 week work, challenging path 1 month work)
Option 3
Frequency: once a year
Version to upgrade to: N.2.*
For example, if we have 16.2.15, and there is a new 17.2.0, we upgrade to 17.2.*, otherwise we upgrade to the latest 16.2.*
Pros:
- We get a very stable world tested version of ceph
- We don't get periods running an EOL version
Cons:
- We don't get fixes for a whole year
- We have to allocate time for it once a year (happy path 1 week work, challenging path 1 month work)
Option 4
Frequency: every 6 months
Version to upgrade to: (N-1).2.*
For example, if we have 16.2.15, and there is a new 18.2.0, we upgrade to 17.2.*, otherwise we upgrade to the latest 16.2.*
Pros:
- We get a very stable world tested version of ceph
Cons:
- We might get some months with EOL version
- We have to allocate time for it twice a year (happy path 1 week work, challenging path 1 month work)
Option 5
Frequency: every 6 months
Version to upgrade to: N.2.*
For example, if we have 16.2.15, and there is a new 17.2.0, we upgrade to 17.2.*, otherwise we upgrade to the latest 16.2.*
Pros:
- We get a very stable world tested version of ceph
- We don't get periods running an EOL version
Cons:
- We have to allocate time for it twice a year (happy path 1 week work, challenging path 1 month work)