Page MenuHomePhabricator

Prove helm as a potential k8s deployment tool
Closed, ResolvedPublic

Description

The use of helm to manage applications for the staging k8s as part of streamlined service delivery has been suggested a few times. @dduvall has done some work in minikube with helm to deploy Mathoid containers built by blubber. We should move some of this work into CI for Mathoid once we're ready to start work on the staging part of the pipeline.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptAug 11 2017, 6:24 PM
dduvall renamed this task from Using helm to manage staging k8s applications to Proof helm as a potential k8s deployment tool.Aug 30 2017, 9:29 PM
dduvall triaged this task as Medium priority.
dduvall updated the task description. (Show Details)
dduvall moved this task from Backlog to Deployment Tooling on the Release Pipeline board.
Joe added a comment.Sep 8 2017, 2:04 PM

I did a review of how helm works/what it offers in relation to our environment.

While I favour using it instead of creating our own tool, there are quite a few things helm is very useful for that we won't use much in production.

Specifically:

  • We don't need typically to define a service with a collection of charts, now we will use externally-defined charts for production
  • Since we want to wrap the applications into a standard pod, we might want to have a scaffolding mechanism that allows to create the helm chart/repo easily and almost-automatically
  • Developers will then basically have just to edit a configmap template (and maybe secrets if needed). Variables in those templates can be controlled/overridden in production
  • We'll probably need to fine-tune which parameters we can let developers specify and which should be under complete ops control.
  • We will have to build canary releasing on top of helm by creating a chart describing the service that includes the canary chart and the production chart

I am going to try to write down a base chart that can be easily used with our node services in the coming weeks.

Reedy renamed this task from Proof helm as a potential k8s deployment tool to Prove helm as a potential k8s deployment tool.Sep 8 2017, 2:07 PM
Joe moved this task from Backlog to Doing on the User-Joe board.Sep 11 2017, 9:50 AM
Joe added a comment.Sep 14 2017, 7:19 AM

After the discussion the other day at the containers cabal meeting, I promised to come up with a proposal for helm chart development/management. So here it is.

I am considering two main requirements:

  1. There is an intrinsic value in using the same chart for development purposes inside minikube and in production.
  2. Ops want to be able to change massively things in charts without the need to go around N repositories to do that

My proposed workflow is as follows:

  1. Charts are kept into a central git repo which is regularly used to upload new chart versions to our chart repository ("repository" here means a helm repository), that would be publicly exposed. The default values.yaml in the chart will be the ones used in development.
  2. When developing a new service, there will be a scaffolding mechanism to create the new chart, in order to reduce the amount of work a developer has to do.
  3. This chart will be uploaded and exposed via the helm repository to the public
  4. Once the chart is in place, mwctl or any other tool we use in development can just download the chart content to a specified directory shelling out to something like helm fetch --untar --untardir deployment --repo helm-charts.wikimedia.org --version X.Y.Z <my-service> which will download and untar the chart. In this scenario, the application repository should include some metadata file indicating which version of the helm chart should be used.
  5. In development, the chart should always refer the container at its latest tag, so that the latest container built on the dev machine will be used.
  6. Charts for other services will be downloaded as-is from the repo and used to provide the full environment
  7. In production, there will be yaml files with values provided by ops, that will override the default settings and result in a full deployment including the sidecar containers for logging, proxying and monitoring

The more debatable part of the issue is how to coordinate chart development and app development. Specifically, the configmap for the service might need to change because a new configuration variable is added to the application. The more sensible approach to this issue is, in my opinion, to have the following workflow:

  1. Develop the application feature that needs the new configuration, setting a default value that works in the dev environment. Ideally, the configmap should be superfluous in the dev environment.
  2. Once the new feature is complete, make a patch to the git repository for charts and merge it changing the helm chart version. Ideally, this merge should auto-trigger a CI job to upload the chart to the helm repository.
  3. Change the metadata of the application to match the new chart version, and commit the change
  4. CI will now use the new chart version to deploy the software to staging once the patch is merged.
  5. How to set the values file for staging automagically is an open question at this point. I don't have brilliant answers for that problem.

I think the overhead here is somewhat limited, as changes to the config variables shouldn't happen too often. If we think this is a burden, we could have the config variables to configure come out of a data structure we can provide from the application repo itself and have the helm template write them down in the configmap as appropriate.

After the discussion the other day at the containers cabal meeting, I promised to come up with a proposal for helm chart development/management. So here it is.
I am considering two main requirements:

  1. There is an intrinsic value in using the same chart for development purposes inside minikube and in production.
  2. Ops want to be able to change massively things in charts without the need to go around N repositories to do that

My proposed workflow is as follows:

  1. Charts are kept into a central git repo which is regularly used to upload new chart versions to our chart repository ("repository" here means a helm repository), that would be publicly exposed. The default values.yaml in the chart will be the ones used in development.
  2. When developing a new service, there will be a scaffolding mechanism to create the new chart, in order to reduce the amount of work a developer has to do.
  3. This chart will be uploaded and exposed via the helm repository to the public
  4. Once the chart is in place, mwctl or any other tool we use in development can just download the chart content to a specified directory shelling out to something like helm fetch --untar --untardir deployment --repo helm-charts.wikimedia.org --version X.Y.Z <my-service> which will download and untar the chart. In this scenario, the application repository should include some metadata file indicating which version of the helm chart should be used.
  5. In development, the chart should always refer the container at its latest tag, so that the latest container built on the dev machine will be used.
  6. Charts for other services will be downloaded as-is from the repo and used to provide the full environment
  7. In production, there will be yaml files with values provided by ops, that will override the default settings and result in a full deployment including the sidecar containers for logging, proxying and monitoring

That made me think. What about logging in development mode ? Developers definitely will want to see the logs. And part of monitoring should anyway be done by kubernetes as well as part of the health checks it does as those should be in the charts.

Anyway, overall the plan sounds feasible, there's a few details here and there that are interesting. Like the one below

The more debatable part of the issue is how to coordinate chart development and app development. Specifically, the configmap for the service might need to change because a new configuration variable is added to the application. The more sensible approach to this issue is, in my opinion, to have the following workflow:

  1. Develop the application feature that needs the new configuration, setting a default value that works in the dev environment. Ideally, the configmap should be superfluous in the dev environment.
  2. Once the new feature is complete, make a patch to the git repository for charts and merge it changing the helm chart version. Ideally, this merge should auto-trigger a CI job to upload the chart to the helm repository.
  3. Change the metadata of the application to match the new chart version, and commit the change
  4. CI will now use the new chart version to deploy the software to staging once the patch is merged.
  5. How to set the values file for staging automagically is an open question at this point. I don't have brilliant answers for that problem.

I think the overhead here is somewhat limited, as changes to the config variables shouldn't happen too often. If we think this is a burden, we could have the config variables to configure come out of a data structure we can provide from the application repo itself and have the helm template write them down in the configmap as appropriate.

Let's take another case as well. A drive-by developer that just messes with the app's code to fix one bug should be able to get everything working but shouldn't have to mess with the charts if possible. Almost all of the steps above should not be done by them, but rather they should get a new container, test it, say "YAY, push for review to gerrit". Sounds easier but we should hide all the chart complexity in this case from the developer.

Joe added a comment.Nov 14 2017, 8:21 AM

I have been playing with helm quite a bit in the last couple weeks, I think it is, in the end, the best tool for the job we want to accomplish.

I will create a repository for helm charts, and add a base scaffolding script to generate new charts with little to no input from the user.

Joe closed this task as Resolved.Dec 20 2017, 6:52 AM