Page MenuHomePhabricator

Decision Request - Openstack Upgrade Cadence
Closed, ResolvedPublic

Description

Problem

Historically, WMCS has upgraded Openstack on a looser cadence, intending to following the stable -1 version. Without a tighter cadence, generally WMCS has lagged 1-2+ versions behind stable. At times this has caused issues with new feature adoption, for example with Trove and most recently Magnum requiring newer versions of Openstack before deployment.

Goals:

  • Better manage upgrades. Openstack releases in April and October. We should also plan consistent times of the year to do upgrades in response.
  • Run a generally newer Openstack version on average, while still seeking lag time for stability.
  • Make it easier to patch or run newer versions of Openstack as needed in response to a bug or desired feature

Constraints and Risks

  • A stable system is prioritized over features
  • Doing nothing will mean k8s clusters run by Magnum will almost always be EOL during operation.
    • This is due to the following. Kubernetes supports releases for 18 month. Openstack adopts a 9 month old release for stable. 6-9 months later, we upgrade to this version, thus 18 months have elapsed since the kubernetes upstream release, making it EOL.
    • It doesn't seem possible to upgrade Magnum k8s version without upgrading Openstack. This means, our Openstack and kubernetes versions will be tied together.
    • Note, currently our existing k8s version is EOL.
  • Today, WMCS is dependent on debian to package Openstack. In the past, this has led to delays due to this packaging work, as well as not all point releases being packaaged.
  • WMCS currently patches openstack, and will continue to do so

Decision Record

https://wikitech.wikimedia.org/wiki/Wikimedia_Cloud_Services_team/EnhancementProposals/Decision_record_T316866_Openstack_Upgrade_Cadence

Proposals:

Option 1:

Do Nothing. Accept the status quo, including the adhoc upgrade cycle for openstack releases.

Pros
  • No change required
  • Maximum flexibility for planning work
  • Ability to defer upgrades and run EOL without missing expectations
Cons
  • No set expectations for ourselves or users
  • Potential for cumbersome scenarios requiring multiple unplanned upgrades to occur to fix an issue or add a feature
  • No goals met

Option 2:

Maintain n-1 target. Accept running EOL k8s. Schedule twice yearly upgrade months to set expectations.

Pros
  • Same as option 1, with only minor change and flexibility loss
  • Ensures predictability and maintenance of upgrades, rather than relying on adhoc efforts
Cons
  • Potential for cumbersome scenarios requiring multiple unplanned upgrades to occur to fix an issue or add a feature. Even if delay is accepted on adding a feature (according to the upgrade schedule), critical or security issues
  • Addresses only the first of the three stated goals

Option 3:

Create new n-0.5 target cadence. Upgrade to stable version 1-3 months after release.

Pros
  • Ensures openstack and kubernetes versions are up to date and supported during the entire time of operation
  • Ensures predictability and maintenance of upgrades, rather than relying on adhoc efforts
  • No change to upgrade process required
Cons
  • Patching burden isn't improved
  • Maintains dependency on debian packaging

Option 4:

Create new n-0.5 target cadence. Upgrade to stable version 1-3 months after release. Utilize docker or similar for deployment.

Pros
  • Everything under option 3
  • Lower patching burden
  • Improved flexibility to upgrade or respond to issues
  • Meets all stated goals
  • Better target for automation
Cons
  • Requires changing how we deploy Openstack; this will require research and design

Event Timeline

It doesn't seem possible to upgrade Magnum k8s version without upgrading Openstack. This means, our Openstack and kubernetes versions will be tied together.

Could you please share a link to where is this documented upstream?

No goals met

This probably should be expanded to the goals that are not met :)

Actually, most of this is part of the cons already, maybe better just mention each goal that is not met and to which extent (for easy comparison and avoiding repetition)

In Option 4:

Meets all stated goals

This is also a meta-pro, stating what is achieved instead seems fairer

Lower patching burden

Can you elaborate on this? Why/how is the patching burden lowered?

Improved flexibility to upgrade or respond to issues

I'd rephrase this to match how it's stated in the others, something in the lines of "Make it the easiest to patch or run newer versions of Openstack as needed in response to a bug or desired feature" (that is already one of the goals)

Maybe we should try to enumerate the same features/aspects on each option for easy comparison?

Option three is the easiest decision to make here, so I think we should just plan on that until someone has time to dive into the great restructuring required for #4.

It doesn't seem possible to upgrade Magnum k8s version without upgrading Openstack. This means, our Openstack and kubernetes versions will be tied together.

Could you please share a link to where is this documented upstream?

There's a quick compatibility matrix found here: https://wiki.openstack.org/wiki/Magnum. Version support is specific per openstack release, in conjunction with the container OS. See also https://docs.openstack.org/magnum/latest/user/index.html#kube-tag and https://docs.openstack.org/magnum/latest/user/#rolling-upgrade.

No goals met

This probably should be expanded to the goals that are not met :)

Actually, most of this is part of the cons already, maybe better just mention each goal that is not met and to which extent (for easy comparison and avoiding repetition)

Yes, I actually removed option 1 as it's not really an option :-) Option 2 is essentially the "do nothing" option already.

In Option 4:

Meets all stated goals

This is also a meta-pro, stating what is achieved instead seems fairer

Lower patching burden

Can you elaborate on this? Why/how is the patching burden lowered?

Improved flexibility to upgrade or respond to issues

I'd rephrase this to match how it's stated in the others, something in the lines of "Make it the easiest to patch or run newer versions of Openstack as needed in response to a bug or desired feature" (that is already one of the goals)

Maybe we should try to enumerate the same features/aspects on each option for easy comparison?

I agree option 4 isn't as well defined as it could be. I'm not quite sure in remembering who proposed docker rather than debian, but I'll try and extol the virtues of docker. From my understanding, today we have to rely on debian packages, and have to maintain debian patches accordingly to apply to said packages. With docker, it could be easier as just another layer, potentially also be less tied to operating system upgrades, etc. I do recall openstack being difficult to package, but I don't know how much it suffers from vendoring like k8s and other golang projects. See https://lwn.net/Articles/835599/.

That said, I'm in favor of option 3, which essentially involves a commitment to both schedule upgrades, as well as run a slightly newer version of openstack moving forward. The potential improvements from option 4 should be better defined and discussed before adopting. I'm afraid I don't have enough detail yet to be in favor of option 4 either.

btw. I'm up for option 3, would be interesting to explore a bit more option 4, maybe as a long term goal

nskaggs claimed this task.

It seems we have consensus for option 3. I will resolve this ticket and update the decision record accordingly. Thank you for everyone's feedback!