Page MenuHomePhabricator

Automatically clean up unused buildkitd PersistentVolumeClaims periodically
Closed, ResolvedPublic

Description

Background

To deploy buildkitd in Kubernetes we use a statefulset with a PersistentVolumeClaim
template for the buildkitd cache volumes. The statefulset is
autoscaled by HPA. When the statefulset is scaled down, the PVCs associated with the
scaled-down pods are not automatically deleted by Kubernetes[1].

Task

Make a program to scan for and delete unbound buildkitd PersistentVolumeClaims that have lingered past a certain amount of time since their last use (suggested default: 4 hours).

PVCs don't contain information about when they're used or who is using them, so the program will need to periodically join the list of running buildkitd pods with the list of PVCs to maintain its own state.

Deploy this program as a Deployment in https://gitlab.wikimedia.org/repos/releng/gitlab-cloud-runner.

Footnotes

[1] We'll be able to do something simpler in the future. There's support for
automatically deleting Statefulset PVCs during scale-down in newer k8s:
https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/#persistentvolumeclaim-retention

Details

TitleReferenceAuthorSource BranchDest Branch
Make pod and pvc watcher threads resilient to errorsrepos/releng/k8s-pvc-cleaner!26dancymain-I0fa97c028ea4b2797b3b6e26dadc8eb841cb391amain
Exit main thread if a worker thread terminatesrepos/releng/k8s-pvc-cleaner!25dancymain-I886ff2755c4fb81b42130e2e1b39dbf1033621ffmain
pvc-cleaner.py: Add #!/usr/bin/env python3repos/releng/k8s-pvc-cleaner!24dancymain-I9096c578a2731c71ba87e973d5193daff4068137main
Add --debug command line option to enable debug log levelrepos/releng/k8s-pvc-cleaner!16dancymain-I4c355d9fa586bc867ea6c23145566b80af5b878cmain
Configuring k8s-pvc-cleanerrepos/releng/gitlab-cloud-runner!305sandeepsT351478-integrate-k8s-pvc-cleanermain
cleanup unused buildkitd PVCsrepos/releng/k8s-pvc-cleaner!1sandeepsT351478-cleanup-unused-buildkitdmain
Customize query in GitLab

Event Timeline

Sandeeps changed the task status from Open to In Progress.Dec 14 2023, 2:44 PM
Sandeeps claimed this task.

Deployed k8s-pvc-cleaner v1.0.1 to staging and production, confirmed by 'helm list' output showing the current deployed version.

sandeep@lenovo:~/Wikimedia/gitlab-cloud-runner$ helm list -n gitlab-runner
NAME                	NAMESPACE    	REVISION	UPDATED                                	STATUS  	CHART                  	APP VERSION

pvc-cleaner         	gitlab-runner	5       	2024-02-02 22:08:53.656298497 +0000 UTC	deployed	pvc-cleaner-0.1.1      	v1.0.1