During T389045: Update wikikube-staging-eqiad to kubernetes 1.31 we had one worker node on which kube-proxy did not come back up with the puppet run (after wiping the cluster).
We should probably add an extra check if all expected systemd units are running fine after puppet ran.
Description
Description
Details
Details
Related Changes in Gerrit:
| Subject | Repo | Branch | Lines +/- | |
|---|---|---|---|---|
| sre.k8s.wipe-cluster: Verify that k8s service are up after puppet ran | operations/cookbooks | master | +42 -0 |
| Status | Subtype | Assigned | Task | ||
|---|---|---|---|---|---|
| Resolved | JMeybohm | T341984 Update Kubernetes clusters to 1.31 | |||
| Resolved | JMeybohm | T389086 wipe-cluster cookbook should check if systemd services have started properly |
Event Timeline
Comment Actions
Removing DPE SRE as I don't think we need to be involved. Please add us again if needed.
Comment Actions
Change #1150729 had a related patch set uploaded (by JMeybohm; author: JMeybohm):
[operations/cookbooks@master] sre.k8s.wipe-cluster: Verify that k8s service are up after puppet ran
Comment Actions
Change #1150729 merged by jenkins-bot:
[operations/cookbooks@master] sre.k8s.wipe-cluster: Verify that k8s service are up after puppet ran