Page MenuHomePhabricator

Decision request - kubernetes upgrade workgroup
Open, Needs TriagePublic

Description

Problem

We are several years behind kubernetes upgrades, and in order to catch up, we need to upgrade faster than upstream releases for some time.

Constraints and risks

  • All the problems of running old software (security, bugs, stability, ...)

Extra info

Decision record

In progress

https://wikitech.wikimedia.org/wiki/Wikimedia_Cloud_Services_team/EnhancementProposals/Decision_record_T302593_How_do_we_make_decisions

Options

Option 1

Do nothing

Pros:

  • No extra effort needed

Cons:

  • We never catch up

Option 2

Create a dedicated opt-in workgroup to focus on monthly Kubernetes upgrades until we catch up (as aim, some upgrades might take more), and continue with regular updates thereafter.

Pros:

  • K8s upgrade progress greatly improves
  • we spread upgrade knowledge in the team
  • we setup a working group that can then take over the regular updates (3/year)
  • automation improvement and refinement

Cons:

  • Considerable effort sometimes when api deprecations happen to affect us
  • Are monthly updates compatible with other work streams?

Option N

Add your options here

Interested to be part of the working group? Add yourself below:

Event Timeline

fnegri renamed this task from Decision request template - kubernetes upgrade workgroup to Decision request - kubernetes upgrade workgroup.Apr 29 2024, 5:30 PM
fnegri moved this task from Inbox to Discussion on the Cloud Services Proposals board.

Can we link any resources we already have (automations, cookbooks, instructions, etc) on how we handle k8s upgrade here too? k8s upgrade is easy on paper but I assume it'll probably be more hairy for our particular implementation. Btw for the decision request I'm going with Option 2

Added it :), feel free to edit the task and add more if you find more

Option 2 seems to me like the obviously good choice :)

On a first read, I was under the impression that the working group would exist only until we catch up, but from option 2 it seems clear that this would become a regular working group.

Option 2 seems to me like the obviously good choice :)

On a first read, I was under the impression that the working group would exist only until we catch up, but from option 2 it seems clear that this would become a regular working group.

Feel free to reword/clarify there :)
Without the long-term workgroup might be another option too if you prefer.

I would like to participate on the upgrades. I don't have any strong opinion on the different options at the moment.

Option 2 (including the long-term workgroup) looks fine to me.

Maybe I would add that "monthly" is the target, but some difficult upgrades might need more time. We could publish a roadmap and highlight the upgrades that we anticipate might require more time, because of API deprecations.

Option 2 (including the long-term workgroup) looks fine to me.

Maybe I would add that "monthly" is the target, but some difficult upgrades might need more time. We could publish a roadmap and highlight the upgrades that we anticipate might require more time, because of API deprecations.

Sounds good to me, iirc usually there's one big upgrade a year where they deprecate many things/do big changes, I think though that we can keep track of that on the tasks (https://phabricator.wikimedia.org/T316107), after assigning say three of them, the asignees can start looking into what changes come with that upgrade and plan a bit ahead, wdyt? (this can be discussed after, no need to decide now).

I think though that we can keep track of that on the tasks

The task is good to discuss the details, but I see a value in having a high-level wiki page with the list of upgrades, and a very short summary like "this one should be easy", or "this one is complicated, see details in the task".