Our new Helm chart templates were not originally developed to handle multi-service deployment charts. When I created the eventgate chart, I used the wmf.releasename, which evaluates to .Chart.Name + .Release.Name to identify a service instance. The release name for e.g. the eventgate-analytics service is 'analytics'.
But this isn't quite right. Every time we deploy we get a new 'release', we just happen to force that the name of the release stays the same.
Alex and I recently had need to support live canary releases of a service. We want to be able to deploy a new image and/or configs to a limited number of pods and have them serve live traffic. This would allow us to first do the canary release, and then the main production release once the canary works fine. To do this, we'd add an extra release in the service's helmfile.yaml, like:
releases: - production: values: - "values.yaml" - "private/secrets.yaml" - canary: values: - "values.yaml" - "values-canary.yaml" - "private/secrets.yaml"
Values defined in canary.yaml would override the ones in values.yaml. canary.yaml would set things like replicas: 1 to ensure the canary only deployed one pod.
This approach works fine in the single app instance charts, where wmf.releasename makes sense, e.g. mathoid-production. For multi app instance charts where we are currently abusing the .Release.Name to ID the service, there is currently no real way to ID the app instance in the chart. If we kept things as they are now and added canary releases to both eventgate-main and eventgate-analytics, both eventgate-main and eventgate-analytics canary's wmf.releasename would evaluate to 'eventgate-canary'.
I propose we add a new main_app.name concept into our Helm charts, set in values.yaml. The chart's main values.yaml file doesn't know about multiple service instances, so it would just set this default to the chart's name (or whatever is appropriate). If a chart doesn't have multiple app instances, no change would be needed in the helmfile values.yaml. We'd also change wmf.releasename to evaluate to .Values.main_app.name + .Release.Name. Example:
mathoid service template variables
.Chart.Name: mathoid .Values.main_app.name: mathoid .Release.Name: production # (or canary) wmf.releasename: mathoid-production # (or mathoid-canary)
eventgate-main template variables
.Chart.Name: eventgate .Values.main_app.name: eventgate-main .Release.Name: production # (or canary) wmf.releasename: eventgate-main-production # (or eventgate-main-canary)
We'd also want to use the main_app.name in a label for all of the k8s resources. e.g.
labels: chart: {{ template "wmf.chartname" . }} # mathoid or eventgate app: {{ .Values.main_app.name }} # mathoid or eventgate-main release: {{ .Release.Name }} # production or canary
For resources that need to use matchLabels to actually match the exact release resources, we'd match all of chart, app and release.
Independent of above, we need a way for the production k8s nodePort Service resource to route to multiple releases. Alex and I have two different approaches to this.
Alex's approach uses a new service label (which currently is conditionally set to either the release name or a special addressed_by value, I think we should always set it to .Values.main_app.name as described above) to have the k8s Service match EITHER the app or the release, depending on another setting for the k8s service resource, .Values.service.address_other_releases. This approach has the advantage of defining a more complex hierarchy of k8s services resources per release. You could potentially have several active releases in a deployment, with each one targeting one or multiple releases.
Otto's approach (for which the author of this ticket is biased :p) just adds a new routing_tag label that the k8s Service uses to select which pods it should route to. The value of routing_tag is arbitrary and defaults to .Release.Name, which causes the Service to only route to pods in its release. To enable canary releases, we want the production k8s Services to route to the production release pods as well as the canary release pod. To do this, we set service.routing_tag in values.yaml and values-canary.yaml to a common value (a good choice is just the app name, e.g. 'eventgate-main') shared by both production and canary releases. Since the production release k8s Service will now route to the canary release pods, the canary release does not need a k8s Service resource deployed. This is accomplished by setting service.deployment: none in values-canary.yaml.