Page MenuHomePhabricator

gitlab-cloud-runner: Roll back pending helm releases before running terraform apply
Closed, ResolvedPublic

Description

If helm is interrupted in the middle of a deployment, it can end up in a state where subsequent deployments fail due to the prior deployment being in 'pending-upgrade' state. Terraform gets confused by this state, for example:

Terraform has been successfully initialized!
module.k8s-pvc-cleaner.helm_release.this: Creating...
╷
│ Error: cannot re-use a name that is still in use
│ 
│   with module.k8s-pvc-cleaner.helm_release.this,
│   on k8s-pvc-cleaner/main.tf line 1, in resource "helm_release" "this":
│    1: resource "helm_release" "this" {
│ 
╵

The way to recover from this state is to "helm rollback" the release in question.

Proposal:
In https://gitlab.wikimedia.org/repos/releng/gitlab-cloud-runner/-/blob/main/.gitlab-ci.yml?ref_type=heads, template .deploy, before running terraform-init, etc, run helm -A list --pending to get a list of all pending helm releases in all namespaces, then roll back each of them using helm rollback <release>.

Update: There's a chicken and egg issue with this proposal since helm accesses the kubernetes cluster which requires the kubeconfig information that is established by terraform.

  • Reproduce the problem state by preparing a helm_release resource with a bad image reference, deploying it, and terminating the deployment job when it hangs.
  • See if we can avoid this situation altogether by passing atomic: true to the helm_release resource.

xref: https://github.com/hashicorp/terraform-provider-helm/issues/425

Details

TitleReferenceAuthorSource BranchDest Branch
.gitlab-ci.yml: Run terraform plan before running helm-check.pyrepos/releng/gitlab-cloud-runner!363dancymain-Ide1d7e496520cd3855833cede11045801d73edbemain
Pass $ARGS to helm-check.py in .gitlab-ci.ymlrepos/releng/gitlab-cloud-runner!361dancymain-I75caf636ae02c9c3b62428b631b4f05ea6e9c55fmain
helm-check.py: Terraform refresh before collecting outputsrepos/releng/gitlab-cloud-runner!360dancymain-I08897ed57b5036edab0629642ba5952a9054ac64main
Rewrite helm-check in python and Improve handling of helm release status in scriptrepos/releng/gitlab-cloud-runner!359sandeepsmain-I9fdfc32376eb32d4e572328f9247be1628494c39main
Improve handling of helm release status in scriptrepos/releng/gitlab-cloud-runner!357sandeepsmain-Id6e3321224bbd55646da54906b09ff517bc43dbamain
update cluster version prefix to 1.29.repos/releng/gitlab-cloud-runner!347sandeepsmain-I33d916e34956a605583454c64f1ab747772cd5a4main
added function to check cluster exists and validateTerraformState function.repos/releng/gitlab-cloud-runner!342sandeepsmain-I8c5dd376f69a1fdc7d209b620ae16ab19442ee0amain
update output messagerepos/releng/gitlab-cloud-runner!340sandeepsmain-I97956b6e0a545e50c56858adae80c7f7330591bamain
set kubeconfig file permissions to restrict accessrepos/releng/gitlab-cloud-runner!339sandeepsmain-I538990edb491c83fa78b33b878e1ed0dd15ba588main
update base image reference to include helmrepos/releng/gitlab-cloud-runner!338sandeepsmain-Ife6f8bd550d890f24e950d3e1151f21439c14096main
updating gitlab terraform image reference to include helmrepos/releng/gitlab-cloud-runner!337sandeepsmain-I6528e3fdd1fc3080de75e40245dfed6c4a2af82cmain
adding invalid image reference in pvc cleaner for testing purposerepos/releng/gitlab-cloud-runner!336sandeepsmain-Ia6f07f7bcf7a190f190c844f8edf12ecfd7dbb47main
add helm installationrepos/releng/gitlab-terraform-images!11sandeepsuse-trusted-tagwmf/stable
add helm installationrepos/releng/gitlab-terraform-images!10sandeepsuse-trusted-tag-I0c8d7f8c5c6be031d2421edb4ae077c30cfa6f20use-trusted-tag
fix mismatch in kubeconfig output variable referencerepos/releng/gitlab-cloud-runner!332sandeepsmain-Id30d43bea400d2f04388e28a9b7f16266ae731a9main
add outputs for namespace and kubeconfig in cluster configuration, providing necessary data for cicd operations.repos/releng/gitlab-cloud-runner!331sandeepsmain-I087b382d284576c29fb7b9f96db2ac1d338e9463main
helm check script update and added kube_config variale in digital ocean output.tfrepos/releng/gitlab-cloud-runner!317sandeepsmain-I255d9f68a857455c8341b50de5a3d1b65451d3a3main
Show related patches Customize query in GitLab

Event Timeline

Sandeeps changed the task status from Open to In Progress.Jan 18 2024, 9:59 PM

Hi all, I wanted to update regarding the issue. As, I tried reproducing the error and doing Atomic = true setting didn't resolve the problem, and the deployment is still stuck in a locked state. I think we need more investigation on it.

sandeeps updated https://gitlab.wikimedia.org/repos/releng/gitlab-cloud-runner/-/merge_requests/317

helm check script update and added kube_config variale in digital ocean output.tf

sandeeps merged https://gitlab.wikimedia.org/repos/releng/gitlab-cloud-runner/-/merge_requests/317

helm check script update and added kube_config variale in digital ocean output.tf

sandeeps updated https://gitlab.wikimedia.org/repos/releng/gitlab-cloud-runner/-/merge_requests/331

add outputs for namespace and kubeconfig in cluster configuration, providing necessary data for cicd operations.

sandeeps merged https://gitlab.wikimedia.org/repos/releng/gitlab-cloud-runner/-/merge_requests/331

add outputs for namespace and kubeconfig in cluster configuration, providing necessary data for cicd operations.

sandeeps updated https://gitlab.wikimedia.org/repos/releng/gitlab-cloud-runner/-/merge_requests/342

added function to check cluster exists and validateTerraformState function.

sandeeps merged https://gitlab.wikimedia.org/repos/releng/gitlab-cloud-runner/-/merge_requests/342

added function to check cluster exists and validateTerraformState function.

sandeeps opened https://gitlab.wikimedia.org/repos/releng/gitlab-cloud-runner/-/merge_requests/359

Rewrite helm-check in python and Improve handling of helm release status in script

sandeeps closed https://gitlab.wikimedia.org/repos/releng/gitlab-cloud-runner/-/merge_requests/359

Rewrite helm-check in python and Improve handling of helm release status in script

I think we can consider this done.