Page MenuHomePhabricator

wipe-cluster cookbook should check if systemd services have started properly
Closed, ResolvedPublic

Description

During T389045: Update wikikube-staging-eqiad to kubernetes 1.31 we had one worker node on which kube-proxy did not come back up with the puppet run (after wiping the cluster).
We should probably add an extra check if all expected systemd units are running fine after puppet ran.

Event Timeline

JMeybohm triaged this task as Medium priority.
Gehel subscribed.

Removing DPE SRE as I don't think we need to be involved. Please add us again if needed.

Change #1150729 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[operations/cookbooks@master] sre.k8s.wipe-cluster: Verify that k8s service are up after puppet ran

https://gerrit.wikimedia.org/r/1150729

Change #1150729 merged by jenkins-bot:

[operations/cookbooks@master] sre.k8s.wipe-cluster: Verify that k8s service are up after puppet ran

https://gerrit.wikimedia.org/r/1150729

JMeybohm claimed this task.