Decision request - WMCS kubernetes standard deployment code pattern
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	aborrero
	Mar 16 2022, 12:45 PM

Description

Problem

We have a number of things that deploy into kubernetes, for example:

toolforge custom admission controllers (currently 3 and growing)
toolforge & paws maintain_kubeusers
toolforge jobs framework components (currently 2, api and emailer)
toolforge components deployed from operations/puppet.git such as the ingress setup and other pieces
paws stuff

(and potentially more that I'm overlooking at the moment).

Each of the items listed above has a different deployment code pattern. For example:

a deploy.sh script with some logic inside it
a kustomize-based setup
a helm-based setup
a raw kubectl apply call
some combination of all of the above

For a number of reasons, there is no written agreement on which deployment code pattern to use for a given repository.

NOTE: we have a number of kubernetes clusters maintained by WMCS: tools, toolsbeta, paws, and potentially more in the future. This request covers all software components for k8s clusters maintained by WMCS.

Constraints and risks

Some additional notes.

certificates

Some components need x509 certificate generation, and/or other credential management. Ideally, the option we choose is valid to handle the required certificate/credential management.

deployment mechanism

We should perhaps consider this 'deployment code pattern' different from the 'deployment mechanism'.

Let 'deployment mechanism' be the way in which we trigger this deployment, at the moment the options are:

100% manual. A human runs a command on a server.
somewhat automated: by means of a spicerack cookbook, puppet agent run, some other script, or whatever.
CI/CD pipeline, for example for PAWS, which is currently based on github actions I believe.

Please note that the 'deployment code pattern' concept is independent of the 'deployment mechanism'.
We could automate helm, kustomize or whatever, once we decide which one to use.

Deciding on 'deployment mechanism' (or automation level/mode) is out of scope of this request.

Note however, that deciding on this very request will greatly benefit us later on when we start automating stuff.

new standard, who makes the changes?

If we introduce a new standard, we will need updates to several code repositories. That could be a lot of work.

The author of this request volunteers to do the work once the standard has been decided.

Decision record

TBD.

https://wikitech.wikimedia.org/wiki/Wikimedia_Cloud_Services_team/EnhancementProposals/Decision_record_T303931_k8s_standard_deployment_code_pattern

Options

Option 1

Use helm https://helm.sh/

The proposed standard files are as follow:

topdir/
topdir/helmchart/
topdir/helmchart/values.yaml            <--- base file
topdir/helmchart/values-toolsbeta.yaml  <--- toolsbeta-specific overrides
topdir/helmchart/values-tools.yaml      <--- toolforge-specific overrides
topdir/helmchart/values-paws.yaml       <--- paws-specific overrides
topdir/helmchart/values-devel.yaml      <--- additional, arbitrary overrides are allowed

(yes, some components deploy into the 3 environments, maintain-kubeusers is a good example)

Example of patch introducing this layout to one of our custom components:

https://gerrit.wikimedia.org/r/c/cloud/toolforge/jobs-framework-emailer/+/747107

Example of manual operations using helm:

user@machine:~$ helm install --debug --dry-run app-name ./helmchart -f helmchart/values-toolsbeta.yaml
[..]
user@machine:~$ helm install --debug --dry-run app-name ./helmchart -f helmchart/values-tools.yaml
[..]
user@machine:~$ helm install app-name ./helmchart -f helmchart/values-toolsbeta.yaml
[..]
:# to upgrade:
user@machine:~$ helm diff upgrade app-name ./helmchart -f helmchart/values-toolsbeta.yaml
[..]
user@machine:~$ helm upgrade app-name ./helmchart -f helmchart/values-toolsbeta.yaml
[..]

(but again, the deployment mechanism is not covered in this request)

Pros:

Industry standard to deploy stuff in k8s.
Standard within other SRE teams @ WMF.
Has a concrete specification on how to layout a given directory.

Cons:

Is a package manager, mostly aimed at "apps". Many of our components are not "apps", but simple pieces of codes that do something.
not integrated by default in kubernetes (kubectl etc)
a bit "more" noisy code than with kustomize.
the concrete specification on how to layout a given directory could be handicap in some cases.
some unknowns for x509 certificate generation & management.

Option 2

Use kustomize https://kubernetes.io/docs/tasks/manage-kubernetes-objects/kustomization/

The propose directory tree layout is as follows:

topdir/
topdir/deployment/
topdir/deployment/base/           <--- the base yaml
topdir/deployment/tools/          <-- the toolforge-specific overrides
topdir/deployment/toolsbeta/      <-- the toolsbeta-specific overrides
topdir/deployment/paws/           <-- the paws-specific overrides
topdir/deployment/devel-whatever/ <-- additional overrides are allowed

Example of patch introducing this layout to one of our custom components:

https://gerrit.wikimedia.org/r/c/cloud/toolforge/jobs-framework-emailer/+/769694

Example of manual operations using kustomize:

user@machine:~$ kubectl get -k deployment/toolsbeta
[..]
user@machine:~$ kubectl apply -k deployment/toolsbeta
[..]
user@machine:~$ kubectl diff -k deployment/toolsbeta
[..]

(but again, the deployment mechanism is not covered in this request)

Pros:

Industry standard to deploy stuff in k8s.
Integrated by default in kubernetes (via kubectl).
simple and to the point.

Cons:

kustomize is not a full fledged standardized ecosystem (e.g. a repository format itself) and would rely on us introducing a explicit layout.
apparently less sugar & magic than helm (but do we need that?)
you need to manually delete removed kubernetes resources
some unknowns for x509 certificate generation & management.

Option 3

Use whatever, but have a common entry point ./deploy.sh.

This options assumes that each component has its particularities, and that we want to retain flexibility above all.

To achieve this, each k8s component will have an executable ./deploy.sh file at the top level directory which will do all the magic. The magic can be helm, kustomize, or whatever, we don't care as long as it works. This executable script receives no input arguments (or should work out of the box with no input arguments).

Pros:

Simple, efficient, flexible.
Perhaps the most sensible way to handle x509 certificate generation (since we can have arbitrary logic here).

Cons:

Less elegant maybe?
Perhaps assuming we need the additional flexibility is overly defensive and this will bite us in the future.

Details

Subject	Repo	Branch	Lines +/-
volume-admission-controller: bug fix and general improvement to ./deploy.sh	cloud/toolforge/volume-admission-controller	main	+143 -33
volume-admission-controller: add deploy.sh script	cloud/toolforge/volume-admission-controller	main	+18 -1
jobs-framework-emailer: introduce deploy.sh script	cloud/toolforge/jobs-framework-emailer	main	+9 -1
Adapt to use the deploy.sh script standard	labs/tools/registry-admission-webhook	master	+130 -111
jobs-framework-api: relocate deploy.sh script	cloud/toolforge/jobs-framework-api	main	+0 -0
toolforge: deploy ingress-nginx via helmfile and provide deploy.sh	operations/puppet	production	+79 -8
wmcs: toolforge: k8s: default to deploy.sh as deployment command	operations/cookbooks	wmcs	+3 -6

Customize query in gerrit

Related Objects

Mentioned In: rCCKBbf89ec6db0fc: wmcs: toolforge: k8s: default to deploy.sh as deployment command
rCTVA7e9194cd81f8: volume-admission-controller: bug fix and general improvement to ./deploy.sh
rCTVA9819f9d048e6: volume-admission-controller: add deploy.sh script
rCTJE794cc99ca5de: jobs-framework-emailer: introduce deploy.sh script
rLTRA8821a0436d1a: Adapt to use the deploy.sh script standard
T291915: toolforge: automate how we deploy custom k8s components

Event Timeline

aborrero created this task.Mar 16 2022, 12:45 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMar 16 2022, 12:45 PM

aborrero updated the task description. (Show Details)Mar 16 2022, 12:46 PM

aborrero updated the task description. (Show Details)Mar 16 2022, 12:48 PM

aborrero updated the task description. (Show Details)

aborrero updated the task description. (Show Details)Mar 16 2022, 12:54 PM

aborrero moved this task from Inbox to Needs discussion on the cloud-services-team (Kanban) board.

dcaro added a project: User-dcaro.Mar 16 2022, 12:58 PM

taavi updated the task description. (Show Details)Mar 16 2022, 2:49 PM

taavi updated the task description. (Show Details)Mar 16 2022, 2:51 PM

Thank you for starting this discussion!

[Helm]

Standard within other SRE teams @ WMF.

Note that this also includes helmfile, which is a little wrapper around helm to turn almost all manual helm operations (such as adding external repositories or installing/upgrading charts) to a single helmfile apply command. I don't see it as a problem (actually the other way around, I think it's quite handy from using it on another personal project) but it's one more moving piece in that stack.

Some components need x509 certificate generation, and/or other credential management. Ideally, the option we choose is valid to handle the required certificate/credential management.

I've been experimenting with cert-manager, which works with a CRD (custom resource definition) model. If we end up going with that (I think we should, but that's probably out of scope for this decision) we should be able to deploy the certificate objects regardless of which tool we choose like any other kubernetes resources.

Random thoughts:

As I understand it, helm allows you to package and version your application, the problem is when you need to do some changes to that (ex. adding secrets, etc.) that should not be public, so you either have an internal secured helm repository, or you'll have to tweak the helm charts locally.

Though for what I read, the helmfile wrapper allows you to do those tweaks on the fly without having to modify the charts upstream, so that looks nice.

On the kustomize side, you don't have all that application packaging and management (checking which version is there, upgrading, etc.) but you just generate the yaml from the source code. This, in our case, still requires tweaking the yamls manually to add the secrets and such, so not a big advantage functionally to helmfile.

That said, kustomize seems way simpler (does way less things, just parsing and patching yaml).

I see around people that recommend using helm + kustomize for the last change, though helmfile might do that.

Also, harbor (the future docker repo) has good support for helm charts, so if we need internal ones we can have them without issues.

I'm inclined to go with helm + helmfile.
Can we explore a bit how/what does helmfile do? And how to manage secrets and such?

In T303931#7788097, @dcaro wrote:

As I understand it, helm allows you to package and version your application, the problem is when you need to do some changes to that (ex. adding secrets, etc.) that should not be public, so you either have an internal secured helm repository, or you'll have to tweak the helm charts locally.

Though for what I read, the helmfile wrapper allows you to do those tweaks on the fly without having to modify the charts upstream, so that looks nice.

Well, almost: Helm already has a concept called "values" that lets you customize the charts (think hieradata for helm charts). We already use that for example with the ingress-nginx component with the values file provisioned by Puppet. However the manual commands to update can be quite long since you need to add all those arguments to all helm command executions. Helmfile abstracts those helm arguments to a single YAML file you can keep in version control.

A meeting was held today and we agreed on going with Option 3, a deploy.sh script.

I'll be working soon to introduce such script in all affected repos.

We also decided that helmfile + helm is probably the best thing to do behind the deploy.sh script, but that experimentation and decision is left separated from this for now.

aborrero mentioned this in T291915: toolforge: automate how we deploy custom k8s components.Mar 23 2022, 4:20 PM

Change 773291 had a related patch set uploaded (by Arturo Borrero Gonzalez; author: Arturo Borrero Gonzalez):

[cloud/toolforge/jobs-framework-emailer@main] jobs-framework-emailer: introduce deploy.sh script

https://gerrit.wikimedia.org/r/773291

gerritbot added a project: Patch-For-Review.Mar 23 2022, 5:35 PM

Change 773448 had a related patch set uploaded (by Majavah; author: Majavah):

[operations/puppet@production] toolforge: deploy ingress-nginx via helmfile and provide deploy.sh

https://gerrit.wikimedia.org/r/773448

Change 773491 had a related patch set uploaded (by Arturo Borrero Gonzalez; author: Arturo Borrero Gonzalez):

[operations/cookbooks@wmcs] wmcs: toolforge: k8s: default to deploy.sh as deployment command

https://gerrit.wikimedia.org/r/773491

Change 773491 merged by Arturo Borrero Gonzalez:

[operations/cookbooks@wmcs] wmcs: toolforge: k8s: default to deploy.sh as deployment command

https://gerrit.wikimedia.org/r/773491

Change 773448 abandoned by Majavah:

[operations/puppet@production] toolforge: deploy ingress-nginx via helmfile and provide deploy.sh

Reason:

this was split to a separate repository

https://gerrit.wikimedia.org/r/773448

Change 776894 had a related patch set uploaded (by Arturo Borrero Gonzalez; author: Arturo Borrero Gonzalez):

[cloud/toolforge/jobs-framework-api@main] jobs-framework-api: relocate deploy.sh script

https://gerrit.wikimedia.org/r/776894

Change 776894 merged by Arturo Borrero Gonzalez:

[cloud/toolforge/jobs-framework-api@main] jobs-framework-api: relocate deploy.sh script

https://gerrit.wikimedia.org/r/776894

Change 778292 had a related patch set uploaded (by David Caro; author: David Caro):

[labs/tools/registry-admission-webhook@master] Adapt to use the deploy.sh script standard

https://gerrit.wikimedia.org/r/778292

Change 778292 merged by jenkins-bot:

[labs/tools/registry-admission-webhook@master] Adapt to use the deploy.sh script standard

https://gerrit.wikimedia.org/r/778292

dcaro mentioned this in rLTRA8821a0436d1a: Adapt to use the deploy.sh script standard.Apr 11 2022, 8:50 AM

Change 773291 merged by jenkins-bot:

[cloud/toolforge/jobs-framework-emailer@main] jobs-framework-emailer: introduce deploy.sh script

https://gerrit.wikimedia.org/r/773291

aborrero mentioned this in rCTJE794cc99ca5de: jobs-framework-emailer: introduce deploy.sh script.Sep 30 2022, 1:54 PM

Change 857599 had a related patch set uploaded (by Arturo Borrero Gonzalez; author: Arturo Borrero Gonzalez):

[cloud/toolforge/volume-admission-controller@main] volume-admission-controller: add deploy.sh script

https://gerrit.wikimedia.org/r/857599

Change 857599 merged by Arturo Borrero Gonzalez:

[cloud/toolforge/volume-admission-controller@main] volume-admission-controller: add deploy.sh script

https://gerrit.wikimedia.org/r/857599

aborrero mentioned this in rCTVA9819f9d048e6: volume-admission-controller: add deploy.sh script.Nov 17 2022, 1:25 PM

Change 862315 had a related patch set uploaded (by Raymond Ndibe; author: Ndibe Raymond Olisaemeka):

[cloud/toolforge/volume-admission-controller@main] volume-admission-controller: bug fix and general improvement to ./deploy.sh

https://gerrit.wikimedia.org/r/862315

Change 862315 merged by jenkins-bot:

[cloud/toolforge/volume-admission-controller@main] volume-admission-controller: bug fix and general improvement to ./deploy.sh

https://gerrit.wikimedia.org/r/862315

Raymond_Ndibe mentioned this in rCTVA7e9194cd81f8: volume-admission-controller: bug fix and general improvement to ./deploy.sh.Nov 30 2022, 7:36 PM

dcaro mentioned this in rCCKBbf89ec6db0fc: wmcs: toolforge: k8s: default to deploy.sh as deployment command.Dec 14 2022, 3:33 PM

Decision request - WMCS kubernetes standard deployment code patternClosed, ResolvedPublicActions

Description

Problem

Constraints and risks

certificates

deployment mechanism

new standard, who makes the changes?

Decision record

Options

Option 1

Option 2

Option 3

Details

Related Objects

Event Timeline

Decision request - WMCS kubernetes standard deployment code pattern
Closed, ResolvedPublic
Actions