Problem
Historically, WMCS has upgraded Openstack on a looser cadence, intending to following the stable -1 version. Without a tighter cadence, generally WMCS has lagged 1-2+ versions behind stable. At times this has caused issues with new feature adoption, for example with Trove and most recently Magnum requiring newer versions of Openstack before deployment.
Goals:
- Better manage upgrades. Openstack releases in April and October. We should also plan consistent times of the year to do upgrades in response.
- Run a generally newer Openstack version on average, while still seeking lag time for stability.
- Make it easier to patch or run newer versions of Openstack as needed in response to a bug or desired feature
Constraints and Risks
- A stable system is prioritized over features
- Doing nothing will mean k8s clusters run by Magnum will almost always be EOL during operation.
- This is due to the following. Kubernetes supports releases for 18 month. Openstack adopts a 9 month old release for stable. 6-9 months later, we upgrade to this version, thus 18 months have elapsed since the kubernetes upstream release, making it EOL.
- It doesn't seem possible to upgrade Magnum k8s version without upgrading Openstack. This means, our Openstack and kubernetes versions will be tied together.
- Note, currently our existing k8s version is EOL.
- Today, WMCS is dependent on debian to package Openstack. In the past, this has led to delays due to this packaging work, as well as not all point releases being packaaged.
- WMCS currently patches openstack, and will continue to do so
Decision Record
Proposals:
Option 1:
Do Nothing. Accept the status quo, including the adhoc upgrade cycle for openstack releases.
Pros
No change requiredMaximum flexibility for planning workAbility to defer upgrades and run EOL without missing expectations
Cons
No set expectations for ourselves or usersPotential for cumbersome scenarios requiring multiple unplanned upgrades to occur to fix an issue or add a featureNo goals met
Option 2:
Maintain n-1 target. Accept running EOL k8s. Schedule twice yearly upgrade months to set expectations.
Pros
- Same as option 1, with only minor change and flexibility loss
- Ensures predictability and maintenance of upgrades, rather than relying on adhoc efforts
Cons
- Potential for cumbersome scenarios requiring multiple unplanned upgrades to occur to fix an issue or add a feature. Even if delay is accepted on adding a feature (according to the upgrade schedule), critical or security issues
- Addresses only the first of the three stated goals
Option 3:
Create new n-0.5 target cadence. Upgrade to stable version 1-3 months after release.
Pros
- Ensures openstack and kubernetes versions are up to date and supported during the entire time of operation
- Ensures predictability and maintenance of upgrades, rather than relying on adhoc efforts
- No change to upgrade process required
Cons
- Patching burden isn't improved
- Maintains dependency on debian packaging
Option 4:
Create new n-0.5 target cadence. Upgrade to stable version 1-3 months after release. Utilize docker or similar for deployment.
Pros
- Everything under option 3
- Lower patching burden
- Improved flexibility to upgrade or respond to issues
- Meets all stated goals
- Better target for automation
Cons
- Requires changing how we deploy Openstack; this will require research and design